Method and Apparatus for Correlating Levels of Biomarker Products with Disease

ABSTRACT

In one aspect the invention is a method of testing for one or more colorectal pathologies or one or more subtypes of colorectal pathology (in one embodiment colorectal cancer) in a test individual by providing data corresponding to a level of products of selected biomarkers and applying the data to a formula to provide an indication of whether the test individual has one or more colorectal pathologies or one or more subtypes of colorectal pathology. In some aspects the method is computer based and a computer applies the data to the formula. In other aspects a computer system is configured with instructions that cause the processor to provide a user with the indication of whether the test individual has colorectal pathology. Also encompassed are kits for measuring data corresponding to the products of selected biomarkers which in some embodiments include a computer readable medium. Also encompassed are kits and methods of monitoring therapeutic efficacy of treatments for one or more colorectal pathologies.

PRIORITY CLAIM

This application is a divisional application of U.S. application Ser.No. 11/585,666, filed Oct. 23, 2006, now U.S. Pat. No. 8,239,136, whichclaims the benefit of priority under 35 U.S.C. §119(e) to U.S.provisional application No. 60/729,055, filed Oct. 21, 2005, and to U.S.provisional application Ser. No. 60/758,418, filed Jan. 12, 2006, all ofwhich are incorporated herein by reference in their entirety.

MATERIALS FILED VIA EFS-WEB

This application includes the following two tables as ASCII text files,which are hereby incorporated by reference in their entirety.

Table Size Created Text File Name 1 1,444 KB Oct. 21, 2005 table1.txt 11  399 KB Oct. 21, 2005 table11.txt

LENGTHY TABLES The patent application contains a lengthy table section.A copy of the table is available in electronic form from the USPTO website(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20150191796A1).An electronic copy of the table will also be available from the USPTOupon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

TECHNICAL FIELD

The disclosure relates to apparatus and kits and computer based methodsfor correlating data corresponding to levels of biomarker products witha disease state in a subject.

BACKGROUND

Colorectal cancer is the second-leading cause of cancer-related deathsin the United States (11). Each year, approximately 150,000 people arediagnosed with colorectal cancer and almost 60,000 people die from thedisease. Of those diagnosed, nearly half are expected to die within fiveyears, since most cancers are detected when the cancer is lesstreatable. For those whose cancer is detected at an earlier stage, thefive-year survival rate can be greater than 90%. The American CancerSociety recommends that all Americans age 50 and older be screenedregularly for colorectal cancer. Unfortunately, only a fraction of thispopulation is screened for the disease, as currently available screeningtechnologies are considered as either too costly, and/or too invasive orin some cases insufficiently accurate.

Most colorectal cancers begin as small, noncancerous (benign) clumps ofcells called polyps. Over time some of these polyps become cancerous.Incidence of polyps increases as individuals get older. It is estimatedthat 50% of the people over the age of 60 will have at least one polyp.

The significance of identifying one or more colorectal pathologiesincluding polyps is that certain types of polyps are cancerous orindicative of an increased risk to develop cancer. It has been shownthat the removal of certain subtypes of polyps reduces the risk ofgetting colorectal cancer significantly. Therefore, a test to screen forone or more colorectal pathologies including polyps and/or certainsubtypes of polyps so as to allow early removal or to preventunnecessary procedures should markedly reduce the incidence ofcolorectal cancer (12) and decrease the current costs to the medicalsystem.

Currently utilized screening technologies to identify polyps include 1)a fecal occult blood test (FOBT); 2) a flexible sigmoidoscopy; 3) doublecontrast barium enema (DCBE); and 4) colonoscopy. Sometimes two or moreof these tests are used in combination. The current recommendedstandards for screening for colorectal cancer in men over the age of 50and who are considered part of an average risk population include: anFOBT yearly, a sigmoidoscopy every five years, a colonoscopy every tenyears and a DCBE every five years. For a high risk population where oneor more family members have had colorectal cancer, a colonoscopy isrecommended every two years as a follow up to FOBT or sigmoidoscopy.

Each of these tests suffers significant disadvantages. FOBT testing,although a non-invasive procedure, requires significant dietary andother restrictions prior to testing and suffers from a low sensitivity.Sigmoidoscopy and colonoscopy are more sensitive since they involvedirect visualization of the lumen, however, sigmoidoscopy only allowspartial visualization, and the colonoscopy is known to miss about 12% ofadvanced adenomas. Both sigmoidoscopy and colonoscopy are highlyinvasive procedures which cause high levels of discomfort causing manyindividuals to opt not to undergo these recommended screeningprocedures. Also sigmoidoscopy and colonoscopy are costly, and may havecomplications which arise as a result of undergoing the procedure.

Thus, there is a need for an improved test which is minimally invasiveso as to permit more widespread testing of the population to indicatethe presence of one or more colorectal pathologies, and ensure greateradherence to recommended protocols. To date, despite this need, therehave been very few advancements in identifying useful molecularbiomarkers to test for colorectal pathology. Recent efforts have focusedon DNA based biomarker methods (see for example Shuber et al. U.S.Patent Application Publication No. 2005-0260638A1; or Lofton-Day et al.WO2005/001142).

Identification of biomarkers for use in a non-invasive test forcolorectal pathology thus fulfills a longstanding need in the art.

SUMMARY

In contrast to technologies available in the art, the inventionsdescribed herein identify biomarkers not previously associated withcolorectal pathology whose gene expression levels, measured alone or incombination, and optionally applied to formulas to convert the levels toa measure, give an indication of a likelihood of colorectal pathology.

The present invention discloses novel colorectal pathology-specificbiomarkers, such as blood-specific biomarkers, and methods, compositionsand kits for use in testing for colorectal pathologies, such aspre-cancerous and cancerous pathologies. This use can be effected in avariety of ways as further described and exemplified herein.

According to one aspect of the present invention there is provided amethod of testing for one or more colorectal pathologies in a testsubject, the method comprising: (a) providing data representative of alevel of one or more products of each of one or more biomarkers in asample from the test subject; and (b) ascertaining whether the datacharacterizes either: (i) subjects having the one or more colorectalpathologies, or (ii) subjects not having the one or more colorectalpathologies; thus providing an indication of a probability that the testsubject has the one or more colorectal pathologies.

According to another aspect of the present invention there is provided acomputer-based method for testing for one or more colorectal pathologiesin a test subject, the method comprising: inputting, to a computer, datarepresenting a level of products of each of one or more biomarkers in asample isolated and/or derived from a test subject, wherein thebiomarkers are genes selected from the group consisting of BCNP1, CD163,CDA, MS4A1, BANK1, and MGC20553; and causing the computer to ascertainwhether the data characterizes either: (i) subjects having the one ormore colorectal pathologies, or (ii) subjects not having the one or morecolorectal pathologies, thus providing an indication of a probabilitythat the test subject has the one or more colorectal pathologies.

According to still another aspect of the present invention there isprovided a computer-readable medium comprising instructions forascertaining whether data characterizes either: (i) subjects having oneor more colorectal pathologies, or (ii) subjects not having one or morecolorectal pathologies, the data representing a level of one or moreproducts of each of one or more biomarkers in a sample isolated and/orderived from a test subject, wherein the biomarkers are genes selectedfrom the group consisting of BCNP1, CD163, CDA, MS4A1, BANK1, andMGC20553, thus providing an indication of a probability that the testsubject has the one or more colorectal pathologies.

According to a yet another aspect of the present invention there isprovided a computer system for providing an indication of a probabilitythat a test subject has one or more colorectal pathologies, the computersystem comprising a processor; and a memory configured with instructionsthat cause the processor to provide a user with the indication, whereinthe instructions comprise ascertaining whether data characterizeseither: (i) subjects having one or more colorectal pathologies, or (ii)subjects not having one or more colorectal pathologies, the datarepresenting a level of one or more products of each of one or morebiomarkers in a sample isolated or derived from the test subject,wherein the biomarkers are genes selected from the group consisting ofBCNP1, CD163, CDA, MS4A1, BANK1, and MGC20553; thus providing theindication of a probability that the test subject has the one or morecolorectal pathologies.

According to further features in preferred embodiments of the inventiondescribed below, the products in the sample are RNA.

According to still further features in the described preferredembodiments, the products in the sample are RNA, whereas the datarepresent a level of cDNA, EST and/or PCR product derived from the RNA.

According to still another aspect of the present invention there isprovided a kit comprising packaging and containing one or more primersets, wherein each set of which is able to generate an amplificationproduct by selective amplification of at least a portion of apolynucleotide complementary to one or more RNA products of a biomarker,wherein the biomarker is a gene selected from the group consisting of:BCNP1, CD163, CDA, MS4A1, BANK1, and MGC20553; and wherein each set ofthe primer sets is selective for a different biomarker.

According to further features in preferred embodiments of the inventiondescribed below, the complementary polynucleotide is selected from thegroup consisting of total RNA, mRNA, DNA, cDNA and EST.

According to still further features in the described preferredembodiments, the one or more biomarkers are at least two biomarkers.

According to still further features in the described preferredembodiments, each of the probes is capable of selectively hybridizing toeither a sense or an antisense strand of the amplification product.

According to still further features in the described preferredembodiments, the kit further comprises two or more components selectedfrom the group consisting of: a thermostable polymerase, a reversetranscriptase, deoxynucleotide triphosphates, nucleotide triphosphatesand enzyme buffer.

According to still further features in the described preferredembodiments, the kit further comprises a computer-readable mediumencoded with instructions for ascertaining whether data characterizeseither: (i) subjects having one or more colorectal pathologies, or (ii)subjects not having one or more colorectal pathologies, the datarepresenting levels of the amplification product in a sample isolatedand/or derived from a test subject, thus providing an indication of aprobability that the test subject has the one or more colorectalpathologies.

According to still further features in the described preferredembodiments, ascertaining whether the data characterizes either: (i)subjects having the one or more colorectal pathologies, or (ii) subjectsnot having the one or more colorectal pathologies, comprises applying tothe data a formula based on (i) a dataset representing levels of one ormore products of each of the biomarkers in each subject of a referencepopulation having the one or more pathologies, and (ii) a datasetrepresenting levels of one or more products of each of the biomarkers ineach subject of a reference population not having the one or morepathologies.

According to still further features in the described preferredembodiments, ascertaining whether the data characterizes either: (i)subjects having the one or more colorectal pathologies, or (ii) subjectsnot having the one or more colorectal pathologies, comprisesascertaining whether the data correlates more closely with (i) a datasetrepresenting levels of one or more products of each of the biomarkers ineach subject of a reference population of subjects who have the one ormore colorectal pathologies, or (ii) a dataset representing levels ofone or more products of each of the biomarkers in each subject of areference population of subjects who do not have the colorectalpathology.

According to still further features in the described preferredembodiments, the formula has a form: V=C+Σβ_(i)X_(i), wherein V is avalue indicating a probability that the test subject has the colorectalpathology, X_(i) is a level of products of an ith biomarker of thebiomarkers in the sample, β_(i) is a coefficient, and C is a constant.

According to still further features in the described preferredembodiments, the formula has a form: V=C+Σβ_(ij)(X_(i)/X_(j)), wherein Vis a value indicating a probability that the test subject has thecolorectal pathology, X_(i) is a level of products of an ith biomarkerof the biomarkers, and X_(j) is a level of products of a jth biomarkerof the biomarkers in the sample, where the ith biomarker is not the jthbiomarker, β_(ij) is a coefficient, and C is a constant.

According to still further features in the described preferredembodiments, the formula is derived by a method selected from the groupconsisting of logistic regression, linear regression, neural networks,and principle component analysis.

According to still further features in the described preferredembodiments, the sample is selected from the group consisting of blood,lymph and lymphoid tissue.

According to still further features in the described preferredembodiments, the sample is selected from the group consisting of asample of serum-reduced blood, a sample of erythrocyte-reduced blood, asample of serum-reduced and erythrocyte-reduced blood, a sample ofunfractionated cells of lysed blood, and a sample of fractionated blood.

According to a further aspect of the present invention there is provideda composition comprising a collection of two or more isolatedpolynucleotides, wherein each isolated polynucleotide selectivelyhybridizes to a biomarker selected from the biomarkers set out in Table2 and wherein the composition is used to measure the level of expressionof at least two of the biomarkers.

According to further features in preferred embodiments of the inventiondescribed below, each isolated polynucleotide selectively hybridizes abiomarker selected from the group consisting of membrane-boundtranscription factor protease site 1 (MBTPS1); MGC45871; muskelin 1(MKLN1); nipped-B homolog (NIPBL); acylpeptide hydrolase (APEH);FLJ23091; MGC40157; and protein phosphatase 1 regulatory subunit 2(PPP1R2); and wherein the composition is used to measure the level ofexpression of at least two of the biomarkers.

According to yet a further aspect of the present invention there isprovided a composition comprising a collection of two or more isolatedpolynucleotides, wherein each isolated polynucleotide selectivelyhybridizes to (a) an RNA product of a biomarker selected from thebiomarkers set out in Table 2, and/or (b) a polynucleotide sequencecomplementary to (a), wherein the composition is used to measure thelevel of RNA expression of at least two of the biomarkers.

According to further features in preferred embodiments of the inventiondescribed below, each isolated polynucleotide selectively hybridizes to(a) an RNA product of a biomarker selected from the group consisting ofmembrane-bound transcription factor protease site 1 (MBTPS1); MGC45871;muskelin 1 (MKLN1); nipped-B homolog (NIPBL); acylpeptide hydrolase(APEH); FLJ23091; MGC40157; and protein phosphatase 1 regulatory subunit2 (PPP1R2); and/or (b) a polynucleotide sequence complementary to (a),wherein the composition is used to measure the level of RNA expressionof at least two of the biomarkers.

According to still a further aspect of the present invention there isprovided a composition comprising a collection of two or more isolatedpolynucleotides, wherein each isolated polynucleotide selectivelyhybridizes to (a) an RNA sequences set out in Table 3; and/or (b) apolynucleotide sequences complementary to (a).

According to an additional aspect of the present invention there isprovided a composition comprising a collection of two or more sets ofbiomarker specific primers as set out in Table 4 and/or Table 6.

According to yet an additional aspect of the present invention there isprovided a composition comprising two or more polynucleotide probes asset out in Table 4.

According to further features in preferred embodiments of the inventiondescribed below, the polynucleotides are useful in quantitative RT-PCR(QRT-PCR).

According to still further features in the described preferredembodiments, the isolated polynucleotides comprise single or doublestranded RNA.

According to still further features in the described preferredembodiments, the isolated polynucleotides comprise single or doublestranded DNA.

According to still an additional aspect of the present invention thereis provided a composition comprising a collection of two or moreisolated proteins, wherein each isolated protein binds selectively to aprotein product of a biomarker selected from the biomarkers set out inTable 2 and wherein the composition is used to measure the level ofexpression of at least two of the biomarkers.

According to further features in preferred embodiments of the inventiondescribed below, each isolated protein binds selectively to a proteinproduct of a biomarker selected from the group consisting ofmembrane-bound transcription factor protease site 1 (MBTPS1); MGC45871;muskelin 1 (MKLN1); nipped-B homolog (NIPBL); acylpeptide hydrolase(APEH); FLJ23091; MGC40157; and protein phosphatase 1 regulatory subunit2 (PPP1R2); and wherein the composition is used to measure the level ofexpression of at least two of the biomarkers.

According to still further features in the described preferredembodiments, the isolated proteins are selected from the proteins setout in Table 5.

According to still further features in the described preferredembodiments, the isolated proteins are ligands.

According to still further features in the described preferredembodiments, the ligands are antibodies or fragments thereof.

According to still further features in the described preferredembodiments, the antibodies are monoclonal antibodies.

According to yet still an additional aspect of the present inventionthere is provided a method of diagnosing or detecting one or more colonpathologies in an individual comprising: (a) determining the level ofRNA product of one or more biomarker selected from the group consistingof the biomarkers set out in Table 2 in a sample of an individual; and(b) comparing the level of RNA products in the sample with a control,wherein detecting differential expression of the RNA products betweenthe individual and the control is indicative of a one or more colonpathologies in the individual.

According to further features in preferred embodiments of the inventiondescribed below, the method of diagnosing or detecting one or more colonpathologiess in an individual comprising: (a) determining the level ofRNA product of one or more biomarker selected from the group consistingof the biomarkers set out in Table 2 in a sample of an individual; and(b) comparing the level of RNA products in the sample with a controlfurther comprises (a) determining the level of RNA product of one ormore biomarker selected from the group consisting of membrane-boundtranscription factor protease site 1 (MBTPS1); MGC45871; muskelin 1(MKLN1); nipped-B homolog (NIPBL); acylpeptide hydrolase (APEH);FLJ23091; MGC40157; and protein phosphatase 1 regulatory subunit 2(PPP1R2); in a sample from an individual; and (b) comparing the level ofRNA products in the sample with a control, wherein detectingdifferential expression of the RNA products between the individual andthe control is indicative of a one or more colon pathologies in theindividual.

According to still further features in the described preferredembodiments, the sample comprises whole blood.

According to still further features in the described preferredembodiments, the sample comprises a drop of whole blood.

According to still further features in the described preferredembodiments, the sample comprises blood that has been lysed.

According to still further features in the described preferredembodiments, prior to the determining step, the method comprisesisolating RNA from the sample.

According to still further features in the described preferredembodiments, the step of determining the level of the RNA productscomprises using quantitative RT-PCR (QRT-PCR).

According to still further features in the described preferredembodiments, the QRT-PCR comprises hybridizing primers which hybridizeto the one or more RNA products or the complement thereof.

According to still further features in the described preferredembodiments, the primers are 15-25 nucleotides in length.

According to still further features in the described preferredembodiments, the step of determining the level of each of the one ormore RNA products comprises hybridizing a first plurality of isolatedpolynucleotides that correspond to the one or more transcripts, to anarray comprising a second plurality of isolated polynucleotides.

According to still further features in the described preferredembodiments, the first plurality of isolated polynucleotides comprisesRNA, DNA, cDNA, PCR products or ESTs.

According to still further features in the described preferredembodiments, the array comprises a plurality of isolated polynucleotidescomprising RNA, DNA, cDNA, PCR products or ESTs.

According to still further features in the described preferredembodiments, the second plurality of isolated polynucleotides on thearray comprises polynucleotides corresponding to one or more of thebiomarkers of Table 2.

According to still further features in the described preferredembodiments, the control is derived from an individual that does nothave one or more colon pathologies.

According to another aspect of the present invention there is provided akit for diagnosing or detecting one or more colon pathologies comprisingany one of the compositions and instructions for use.

According to yet another aspect of the present invention there isprovided a kit for diagnosing or detecting one or more colon pathologiescomprising: (a) at least two sets of biomarker specific primers whereineach set of biomarker specific primers produces double stranded DNAcomplementary to a unique biomarker selected from Table 2; wherein eachfirst primers of the sets contains a sequence which can selectivelyhybridize to RNA, cDNA or an EST complementary to one of the biomarkersto create an extension product and each the second primers of the setsis capable of selectively hybridizing to the extension product; (b) anenzyme with reverse transcriptase activity; (c) an enzyme withthermostable DNA polymerase activity, and (d) a labeling means; whereineach of the primer sets is used to detect the quantitative expressionlevels of the biomarker in a test subject.

According to further features in preferred embodiments of the inventiondescribed below, the kit for diagnosing or detecting one or more colonpathologies comprising: (a) at least two sets of biomarker specificprimers wherein each set of biomarker specific primers produces doublestranded DNA complementary to a unique biomarker selected from Table 2;and an enzyme further comprises (a) at least two sets of biomarkerspecific primers wherein each set of biomarker specific primers producesdouble stranded DNA complementary to a unique biomarker selected fromthe group consisting of membrane-bound transcription factor proteasesite 1 (MBTPS1); MGC45871; muskelin 1 (MKLN1); nipped-B homolog (NIPBL);acylpeptide hydrolase (APEH); FLJ23091; MGC40157; and proteinphosphatase 1 regulatory subunit 2 (PPP1R2); wherein each first primersof the sets contains a sequence which can selectively hybridize to RNA,cDNA or an EST complementary to one of the biomarkers to create anextension product and each the second primers of the sets is capable ofselectively hybridizing to the extension product; (b) an enzyme withreverse transcriptase activity; (c) an enzyme with thermostable DNApolymerase activity, and (d) a labeling means; wherein each of theprimer sets is used to detect the quantitative expression levels of thebiomarker in a test subject.

According to still another aspect of the present invention there isprovided a method for diagnosing or detecting one or more colonpathologies in an individual comprising: (a) determining the level ofprotein product of one or more biomarker selected from the groupconsisting of the biomarkers set out in Table 2 in a sample from anindividual; and (b) comparing the level of protein products in thesample with a control, wherein detecting differential expression of theprotein products between the individual and the control is indicative ofa one or more colon pathologies in the individual.

According to further features in preferred embodiments of the inventiondescribed below, the method for diagnosing or detecting one or morecolon pathologies in an individual comprising determining the level ofprotein product of one or more biomarker selected from the groupconsisting of the biomarkers set out in Table 2 in a sample from anindividual further comprises determining the level of protein product ofone or more biomarker selected from the group consisting ofmembrane-bound transcription factor protease site 1 (MBTPS1); MGC45871;muskelin 1 (MKLN1); nipped-B homolog (NIPBL); acylpeptide hydrolase(APEH); FLJ23091; MGC40157; and protein phosphatase 1 regulatory subunit2 (PPP1R2); in a sample from an individual; and (b) comparing the levelof protein products in the sample with a control, wherein detectingdifferential expression of the protein products between the individualand the control is indicative of a one or more colon pathologies in theindividual.

According to still further features in the described preferredembodiments, the level of protein product is determined using antibodiesor fragments thereof.

According to still further features in the described preferredembodiments, the antibodies are selected from the group of antibodiesset out in Table 5.

According to still further features in the described preferredembodiments, the antibodies are monoclonal antibodies.

According to a further aspect of the present invention there is provideda composition comprising a collection of two or more isolatedpolynucleotides, wherein each isolated polynucleotide selectivelyhybridizes to a biomarker selected from the biomarkers set out in Table12 and wherein the composition is used to measure the level ofexpression in blood of at least two of the biomarkers.

According to yet a further aspect of the present invention there isprovided a kit for diagnosing or detecting one or more colon pathologiescomprising any one of the compositions comprising a collection of two ormore isolated proteins and instructions for use.

The present invention successfully addresses the shortcomings of thepresently known configurations, in particular by providing an effectiveand non-invasive method of testing for colorectal pathologies, such aspre-cancerous and cancerous pathologies, via biomarker analysis insurrogate tissues such as blood.

Other features and advantages of the invention will become apparent fromthe following detailed description. It should be understood, however,that the detailed description and the specific examples while indicatingpreferred embodiments of the invention are given by way of illustrationonly, since various changes and modifications within the spirit andscope of the invention will become apparent to those skilled in the artfrom this detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described in relation to the drawings inwhich:

FIG. 1 shows an exemplary computer system for practicing certain of themethods described herein.

FIG. 2 is a comparison of gene expression profiles obtained fromhybridization of RNA isolated from serum-reduced, erythrocyte-reducedblood from 23 controls having no identified colorectal polyps and 22subjects with colorectal polyps as described in Example 2. Patients withcolorectal polyps could have one or more of the subtypes of polypsincluding Hyperplastic; Tubular Adenoma; Villous Adenoma; TubulovillousAdenoma; Hyperplasia; High Grade Dysplasia and Colorectal cancer; Geneexpression profiles were clustered according to the expression of 86significantly (P<0.001) differentially expressed genes. As noted ingrayscale, some of the individuals were misclassified (ie are shown indifferent grey scale under the appropriate bracket) and are consideredoutliers. Each column indicates a gene expression profile for a singlesample and each row represents the expression level of a single gene ineach of the samples. The color of each band within a row indicates therelative level of gene expression (grayscale represents level ofexpression, from low to high in expression). The resulting gene listfrom FIG. 2 is shown in Table 1.

FIG. 3 shows blood mRNA levels as tested by QRT-PCR for four upregulatedgenes (A) and four downregulated genes (B) selected from the genesdemonstrating statistically significant differentially expression usingmicroarray analysis as described in Example 5. QRT-PCR results weretested as between 50 patients diagnosed as having colorectal pathology(i.e., one or more subtypes of colorectal polyps) (n=50) as comparedwith 78 control individuals where the control individuals were diagnosedas not having colorectal pathology (n=78). Comparative (Ct) method basedfold change was used. Mann-Whitney test was used for statisticalanalysis between two groups. Results which demonstrated a f value ofless than 0.05 were considered statistically significant suggesting thatthe gene corresponding to the mRNA level tested is differentiallyexpressed as between the two patient populations (patients withcolorectal pathology and patients without colorectal pathology). Thelines inside the boxes denote the medians. The boxes mark the intervalbetween the 25^(th) and 75^(th) percentiles. The whiskers denote theinterval between the 10^(th) and 90^(th) percentiles.  indicates datapoints outside the 10^(th) and 90^(th) percentiles.

FIG. 4 depicts an ROC curve for a selected biomarker combination fromcombinations of pairs of the eight biomarkers tested and shown in FIG.3. Experimental details are as described in Example 5. Combinations of aselection of the biomarkers identified in Table 1 were tested todetermine the ability of the combinations to screen test subjects forone or more colorectal pathologies more effectively than can be achievedusing the biomarkers of Table 1 individually. QRT-PCR as described inExample 3 was performed to measure the level of the RNA products of aselect group of individual biomarkers from Table 1. Selectedcombinations of biomarkers were tested by applying logistic regressionanalysis to the QRT-PCR results of the selected combination and the ROCcurve for the resulting logistic regression equation (logit function)determined. Panel (A)—ROC curve (ROC Area 0.72) for logit function forone of the datasets tested (AJ36h). This function returned by SimpleLogistic operator in WEKA (ROC Area 0.66).

FIG. 5 depicts the graphical output results of the analysis of allpossible combinations of 9 genes selected from the genes depicted inTable 12 and further described in Example 8. Shown is a graphicaldepiction of ROC area, sensitivity (specificity is set at the 90%threshold) and specificity (sensitivity is set at the 90% threshold) foreach possible combination of 1, 2, 3, 4, 5, 6, 7, 8, or all 9 genes.Further details are described in Example 8.

FIG. 6 depicts the graphical output results of the analysis of allpossible combinations of 6 genes as further described in Example 11.Shown is a graphical depiction of the ROC area, sensitivity (specificityis set at the 90% threshold) and specificity (sensitivity is set at the90% threshold) for each possible combination of 1, 2, 3, 4, 5 and all 6of the genes noted in Example 11.

BRIEF DESCRIPTION OF THE TABLES

Table 1 (filed herewith via EFS-Web) shows genes identified asdifferentially expressed in samples from individuals having or nothaving one or more of any type of colorectal polyps where the polyps caninclude be one or more of the following subtypes of polyps:Hyperplastic; Tubular Adenoma; Villous Adenoma; Tubulovillous Adenoma;Hyperplasia; High Grade Dysplasia and Cancer. The table provides theHugo Gene name (second column), symbol and locus link ID; the human RNAand protein accession number; the p value (which represents thestatistical significance of the observed differential expression asdetermined by measuring RNA encoded by the biomarkers noted) and ameasure of the fold change as between the average measured level ofindividuals having one or more types of colorectal pathology and theaverage measured level of individuals not having colorectal pathology.Column 1 is AffySpotlD, column 2 is GeneSymbol, column 3 is GeneID,Column 4 is p value, column 5 is the HumanRNA Accession Number, column 6is the Human Protein Accession Number, column 7 is the Fold Change, andcolumn 8 is the Gene Description.

Table 2 is a selection of genes listed in Table 1. The table providesthe gene symbol, locus link ID, and gene description. The table alsoincludes the p value (which represents the statistical significance ofthe observed differential expression), the Mann-Whitney value (which isanother measure to represent the statistical significance ofdifferentiating samples with and without colorectal pathology), themeasure of the fold change as between the average measured level ofindividuals having polyps and the average measured level of individualsnot having polyps, and the direction of the differential expressionbetween individuals having colorectal pathology and not havingcolorectal pathology.

Table 3 provides the human RNA accession number and human proteinaccession number of various species of the biomarkers identified asdifferentially expressed in samples from individuals having or nothaving colorectal polyps. The table provides the gene symbol and adescription of the gene.

Table 4 provides a selection of examples of primers and TaqMan® probesuseful in the invention for measuring the RNA products of the biomarkersdisclosed in Table 2.

Table 5 provides reference to antibodies which are commerciallyavailable for protein products of the genes identified in Table 2.

Table 6 shows primer sequences utilized to perform RT-PCR to measure thedifferential expression of RNA products of the genes (biomarkers) fromTable 2 as noted in Example 3 in samples from individuals having or nothaving one or more polyps. The table also provides the gene symbol andRNA accession number corresponding to the biomarkers tested.

Table 7 provides a summary of the phenotypic information of the patientsused as noted in Example 5 to test selected biomarkers for the abilityto test for the presence of colorectal polyps. Parameters including thesample size, gender, age and polyp subtype, (as determined by pathologyreports), are listed.

Table 8 lists selected classifiers for use with data corresponding tothe eight selected biomarkers MBTPS1, MGC45871, MKLN1, NIPBL APEH,MGC40157, PPP1R2, and FLJ23091, which resulted in a ROC area of 0.72 soas to test for the presence of colorectal polyps as described in Example5. Results of QRT-PCR of the eight selected genes are shown in FIG. 3.

Table 9 provides the results of the blind test described in Example 5when applying the formula comprised of classifiers noted in Table 8.

Table 10 is a list of reporter genes and the properties of the reportergene products which may be used to identify compounds for use inpreventing or treating one or more forms of colorectal pathology.

Table 11 (filed herewith via EFS-Web) shows genes identified asdifferentially expressed in samples from individuals having “high riskpolyps” as compared with individuals not having high risk polyps (iehaving low risk polyps or having no pathology at all) using microarrayas described in Example 2. The table provides the gene name, gene ID; arepresentative human RNA accession number, and also provides the pvalue, the fold change (as between the average of individuals classifiedas having high risk polyps as compared with the average of individualshaving low risk polyps), along with the coefficient of variation forboth the high risk polyp individuals and the low risk polyp individuals(the standard deviation of the normalized intensity divided by the meannormalized intensity). Column 1 is AffySpotID, column 2 is Fold Change,column 3 is p value, Column 4 is CV (Coefficient of Variation) (HighRiskPolyp), column 5 is CV (Low Risk Polyp), column 6 is Gene ID, column7 is the HUGO Gene Symbol, column 8 is the Human RNA Accession Numberand column 9 is the Gene Description.

Table 12 shows 48 biomarkers tested for differential expression byQRT-PCR in samples from individuals having colorectal cancer andindividuals not having colorectal cancer. The 48 biomarkers were testedusing QRT-PCR. The table provides the gene symbol, locus link ID, andgene description for each biomarker. The table also includes the p value(which represents the statistical significance of the observeddifferential expression), the measure of the fold change as between theaverage measured level of individuals having colorectal cancer and theaverage measured level of individuals not having colorectal cancer andthe direction of the differential expression between individuals havingcolorectal cancer and not having colorectal cancer.

Table 13 provides the human RNA accession number and human proteinaccession number of various species of the biomarkers identified asdifferentially expressed in samples from individuals having or nothaving colorectal cancer in Table 12. The table provides the gene symboland a description of the gene.

Table 14 provides a selection of examples of primers and TaqMan® probesuseful for measuring one or more RNA products of the biomarkersdisclosed in Table 12 as described in Examples 6, 7, 8, or 9.

Table 15 provides reference to antibodies which are commerciallyavailable to measure protein products of the biomarkers identified inTable 12.

Table 16 shows a selection of primers which have been used for the genesdescribed in Table 12 to quantify one or more RNA products of thebiomarkers.

Table 17 shows primers and TaqMan® probes used in Example 11.

DETAILED DESCRIPTION (A) Overview

In one aspect the invention discloses methods of generatingformulas/classifiers which can be applied to data corresponding tolevels of one or more products of selected biomarker combinations toclassify test subjects as having one or more colorectal pathologies orone or more subtypes of colorectal pathologies. Also disclosed arebiomarkers whose product levels are useful for testing subjects for oneor more colorectal pathologies or one or more subtypes of colorectalpathologies. Also disclosed are computer-readable media comprisinginstructions for applying a formula to data representing a level ofproducts of biomarkers so as to test subjects for one or more colorectalpathologies. Also disclosed is a computer system which is configuredwith instructions to provide the user with an indication of theprobability of a test subject having one or more colorectal pathologiesby applying a formula to biomarker product data.

The present invention provides biomarker product ligands capable ofspecific hybridization with RNA biomarker products so as to enablequantitation of the biomarker products. The biomarker product ligandsmay enable quantitation, either directly and/or indirectly in any one ofvarious ways known to the skilled artisan. The biomarker product ligandscapable of specifically hybridizing with biomarker RNA products orpolynucleotides derived therefrom may have any one of variouscompositions. For example, specific ligands of biomarker products, suchas biomarker RNA products, may be either polynucleotides (e.g.,polynucleotides complementary to at least a portion of the RNA productsor polynucleotides derived therefrom) or polypeptides (e.g., antibodiesor affinity-selected polypeptides specific for at least a portion of theRNA products or polynucleotides derived therefrom). In one embodiment,polynucleotide and/or polypeptide ligands are disclosed which are probescapable of specifically and/or selectively hybridizing with so as toquantitate biomarker RNA products and/or polynucleotides products. Suchprobes include those useful in techniques such as quantitative real-timePCR (QRT-PCR), and may be used, for example, with SYBR® Green, or usingTaqMan® or Molecular Beacon techniques. In one embodiment, thepolynucleotides useful as nucleic acid probes which can be spotted ontoan array to measure levels of biomarker RNA products, or ofpolynucleotides derived therefrom, which are isolated or derived fromtest subjects. In another embodiment, arrays for use in measuring theexpression of the RNA products are contemplated.

In another embodiment, polynucleotide ligands are disclosed which arebiomarker specific primer sets capable of specifically amplifyingbiomarker RNA products and/or polynucleotides corresponding to biomarkerRNA products.

Further disclosed are methods of screening of products of the identifiedbiomarkers to screen for therapeutic targets for treating or preventingone or more colorectal pathologies.

Kits of polynucleotides and/or polypeptide ligands which can be used todetect and monitor differential gene expression of the products of theidentified biomarkers and biomarker combinations are also provided asare kits which include a computer readable medium to allow an indicationof a probability that a test subject has one or more colorectalpathologies. Methods of generating the formulas for testing for one ormore colorectal pathologies are also provided.

Further disclosed is the measurement/monitoring of biomarker productlevels to screen for therapeutic targets for treating or preventing oneor more colorectal pathologies. Methods of generating the formulas fortesting a test subject for one or more colorectal pathologies are alsoprovided.

Also disclosed are methods of testing combinations of biomarkers bygenerating classifiers. Classifiers are generated by applying one ormore mathematical models to data representative of the level ofexpression of the RNA and/or protein products of the biomarkers across areference population encompassing subjects who have one or morecolorectal pathologies, or one or more subtypes thereof, and subjectswho do not have the one or more colorectal pathologies. Classifiers canbe used alone or in combination to create a formula which is useful intesting a subject for the probability of having one or more colonpathologies subtypes. Also disclosed are methods of further selectingclassifiers on the basis of area under the curve (AUC), sensitivityand/or specificity. One or more selected classifiers can be used togenerate a formula and subsequently classifiers can be selected forinclusion into the formulas. Classifiers are generated by measuringlevels of one or more RNA products and/or one or more protein productsof the biomarkers in a sample and using the data resulting from saidmeasurement for input into the mathematical model. Note that it is notnecessary that the same method used to generate the data for creatingthe formulas is the method used to generate data from the test subjectfor inclusion within the formula for diagnostic purposes.

Other aspects of the invention are disclosed herein.

(B) Definitions

The practice of the present invention will employ, unless otherwiseindicated, techniques of molecular biology, microbiology and recombinantDNA techniques, which are familiar to one of the skill in the art. Suchtechniques are explained fully in the literature. See, e.g., Sambrook,Fritsch & Maniatis, 1989, Molecular Cloning: A Laboratory Manual, SecondEdition; Oligonucleotide Synthesis (M. J. Gait, ed., 1984); Nucleic AcidHybridization (B. D. Harnes & S. J. Higgins, eds., 1984); A PracticalGuide to Molecular Cloning (B. Perbal, 1984); and a series, Methods inEnzymology (Academic Press, Inc.); Short Protocols In Molecular Biology,(Ausubel et al., ed., 1995). All patents, patent applications, andpublications mentioned herein, both supra and infra, are herebyincorporated by reference in their entireties.

The following definitions are provided for specific terms which are usedin the following written description.

As used herein, the “5′ end” refers to the end of an mRNA up to thefirst 1000 nucleotides or ⅓ of the mRNA (where the full length of themRNA does not include the poly A tail) starting at the first nucleotideof the mRNA. The “5′ region” of a gene refers to a polynucleotide(double-stranded or single-stranded) located within or at the 5′ end ofa gene, and includes, but is not limited to, the 5′ untranslated region,if that is present, and the 5′ protein coding region of a gene. The 5′region is not shorter than 8 nucleotides in length and not longer than1000 nucleotides in length. Other possible lengths of the 5′ regioninclude but are not limited to 10, 20, 25, 50, 100, 200, 400, and 500nucleotides.

As used herein, the “3′ end” refers to the end of an mRNA up to the last1000 nucleotides or ⅓ of the mRNA, where the 3′ terminal nucleotide isthat terminal nucleotide of the coding or untranslated region thatadjoins the poly-A tail, if one is present. That is, the 3′ end of anmRNA does not include the poly-A tail, if one is present. The “3′region” of a gene refers to a polynucleotide (double-stranded orsingle-stranded) located within or at the 3′ end of a gene, andincludes, but is not limited to, the 3′ untranslated region, if that ispresent, and the 3′ protein coding region of a gene. The 3′ region isnot shorter than 8 nucleotides in length and not longer than 1000nucleotides in length. Other possible lengths of the 3′ region includebut are not limited to 10, 20, 25, 50, 100, 200, 400, and 500nucleotides. As used herein, the “internal coding region” of a generefers to a polynucleotide (double-stranded or single-stranded) locatedbetween the 5′ region and the 3′ region of a gene as defined herein. The“internal coding region” is not shorter than 8 nucleotides in length andnot longer than 1000 nucleotides in length. Other possible lengths ofthe “internal coding region” include but are not limited to 10, 20, 25,50, 100, 200, 400, and 500 nucleotides. The 5′, 3′ and internal regionsare non-overlapping and may, but need not be contiguous, and may, butneed not, add up to the full length of the corresponding gene.

As used herein, the “amino terminal” region of a polypeptide refers tothe polypeptide sequences encoded by polynucleotide sequences(double-stranded or single-stranded) located within or at the 5′ end ofan mRNA molecule. As used herein, the “amino terminal” region refers tothe amino terminal end of a polypeptide up to the first 300 amino acidsor ⅓ of the polypeptide, starting at the first amino acid of thepolypeptide. The “amino terminal” region of a polypeptide is not shorterthan 3 amino acids in length and not longer than 350 amino acids inlength. Other possible lengths of the “amino terminal” region of apolypeptide include but are not limited to 5, 10, 20, 25, 50, 100 and200 amino acids.

As used herein, the “carboxy terminal” region of a polypeptide refers tothe polypeptide sequences encoded by polynucleotide sequences(double-stranded or single-stranded) located within or at the 3′ end ofan mRNA molecule. As used herein, the “carboxy terminal” region refersto the carboxy terminal end of a polypeptide up to 300 amino acids or ⅓of the polypeptide from the last amino acid of the polypeptide. The “3′end” does not include the polyA tail, if one is present. The “carboxyterminal” region of a polypeptide is not shorter than 3 amino acids inlength and not longer than 350 amino acids in length. Other possiblelengths of the “carboxy terminal” region of a polypeptide include, butare not limited to, 5, 10, 20, 25, 50, 100 and 200 amino acids.

As used herein, the “internal polypeptide region” of a polypeptiderefers to the polypeptide sequences located between the amino terminalregion and the carboxy terminal region of a polypeptide, as definedherein. The “internal polypeptide region” of a polypeptide is notshorter than 3 amino acids in length and not longer than 350 amino acidsin length. Other possible lengths of the “internal polypeptide region”of a polypeptide include, but are not limited to, 5, 10, 20, 25, 50, 100and 200 amino acids. The amino terminal, carboxy terminal and internalpolypeptide regions of a polypeptide are non-overlapping and may, butneed not be contiguous, and may, but need not, add up to the full lengthof the corresponding polypeptide.

As used herein, the term “amplified”, when applied to a nucleic acidsequence, refers to a process whereby one or more copies of a particularnucleic acid sequence is generated from a template nucleic acid, in someembodiments by the method of polymerase chain reaction (Mullis andFaloona, 1987, Methods Enzymol., 155:335). “Polymerase chain reaction”or “PCR” refers to a method for amplifying a specific template nucleicacid sequence. In some embodiments, the PCR reaction involves arepetitive series of temperature cycles and is typically performed in avolume of 50-100 μl. The number of cycles performed in the PCR reactioncan include 15, 20, 25, 30, 35, 40, 45, 50, 55 or 60 cycles. Thereaction mix comprises dNTPs (each of the four deoxynucleotides dATP,dCTP, dGTP, and dTTP), primers, buffers, DNA polymerase, and nucleicacid template. The PCR reaction can comprise providing a set ofpolynucleotide primers wherein a first primer contains a sequencecomplementary to a region in one strand of the nucleic acid templatesequence and primes the synthesis of a complementary strand, and asecond primer contains a sequence complementary to a region in a secondstrand of the target nucleic acid sequence and primes the synthesis of acomplementary strand, and amplifying the nucleic acid template sequenceemploying a nucleic acid polymerase as a template-dependent polymerizingagent under conditions which are permissive for PCR cycling steps of (i)annealing of primers required for amplification to a target nucleic acidsequence contained within the template sequence, (ii) extending theprimers wherein the nucleic acid polymerase synthesizes a primerextension product. “A set of polynucleotide primers” or “a set of PCRprimers” or “a set of primers” can comprise two, three, four or moreprimers. In some embodiments, nested PCR can occur using a primer setwherein a first subset of primers is utilized to amplify a singleproduct and then a second subset of primers is utilized which hybridizeto the product of the first subset of primers to amplify a smallerversion of the. In one embodiment, an exo-Pfu DNA polymerase is used toamplify a nucleic acid template in PCR reaction. Other methods ofamplification include, but are not limited to, ligase chain reaction(LCR), polynucleotide-specific based amplification (NSBA), or any othermethod known in the art.

In one aspect an “array” includes a specific set of probes, such asoligonucleotides and/or cDNA's (e.g., ESTs) corresponding in whole or inpart, and/or continuously or discontinuously, to regions of expressedgenomic DNA; wherein the probes are localized onto a support. In oneembodiment, the probes can correspond to the 5′ ends or 3′ ends of theinternal coding regions of a biomarker RNA product of the invention. Ofcourse, mixtures of a 5′ end of one gene may be used as a target or aprobe in combination with a 3′ end of another gene to achieve the sameor similar biomarker RNA product level measurements.

As used herein, an “analog” of a reference proteinaceous agent includesany proteinaceous agent that possesses a similar or identical functionas the reference proteinaceous agent but does not comprise a similar oridentical amino acid sequence as reference proteinaceous agent, and/orpossess a similar or identical structure as the reference proteinaceousagent. A proteinaceous agent that has a similar amino acid sequence as asecond proteinaceous agent is at least one of the following: (a) aproteinaceous agent having an amino acid sequence that is at least 30%,at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, atleast 60%, at least 65%, at least 70%, at least 75%, at least 80%, atleast 85%, at least 90%, at least 95% or at least 99% identical to theamino acid sequence of the second proteinaceous agent; (b) aproteinaceous agent encoded by a nucleotide sequence that hybridizesunder stringent conditions to at least a segment of the nucleotidesequence encoding the second proteinaceous agent, where the segment hasa length of at least 5 contiguous amino acid residues, at least 10contiguous amino acid residues, at least 15 contiguous amino acidresidues, at least 20 contiguous amino acid residues, at least 25contiguous amino acid residues, at least 40 contiguous amino acidresidues, at least 50 contiguous amino acid residues, at least 60contiguous amino residues, at least 70 contiguous amino acid residues,at least 80 contiguous amino acid residues, at least 90 contiguous aminoacid residues, at least 100 contiguous amino acid residues, at least 125contiguous amino acid residues, or at least 150 contiguous amino acidresidues; and (c) a proteinaceous agent encoded by a nucleotide sequencethat is at least 30%, at least 35%, at least 40%, at least 45%, at least50%, at least 55%, at least 60%, at least 65%, at least 70%, at least75%, at least 80%, at least 85%, at least 90%, at least 95% or at least99% identical to the nucleotide sequence encoding the secondproteinaceous agent. A proteinaceous agent with similar structure to asecond proteinaceous agent refers to a proteinaceous agent that has asimilar secondary, tertiary or quaternary structure as the secondproteinaceous agent. The structure of a proteinaceous agent can bedetermined by methods known to those skilled in the art, including butnot limited to, peptide sequencing, X-ray crystallography, nuclearmagnetic resonance, circular dichroism, and crystallographic electronmicroscopy.

To determine the percent identity of two amino acid sequences or of twonucleic acid sequences, the sequences are aligned for optimal comparisonpurposes (e.g., gaps can be introduced in the sequence of a first aminoacid or nucleic acid sequence for optimal alignment with a second aminoacid or nucleic acid sequence). The amino acid residues or nucleotidesat corresponding amino acid positions or nucleotide positions are thencompared. When a position in the first sequence is occupied by the sameamino acid residue or nucleotide as the corresponding position in thesecond sequence, then the molecules are identical at that position. Thepercent identity between the two sequences is a function of the numberof identical positions shared by the sequences (i.e., % identity=numberof identical overlapping positions/total number ofpositions.times.100%). In one embodiment, the two sequences are the samelength.

The determination of percent identity between two sequences can also beaccomplished using a mathematical algorithm. A preferred, non-limitingexample of a mathematical algorithm utilized for the comparison of twosequences is the algorithm of Karlin and Altschul, 1990, Proc. Natl.Acad. Sci. U.S.A. 87:2264-2268, modified as in Karlin and Altschul,1993, Proc. Natl. Acad. Sci. U.S.A. 90:5873-5877. Such an algorithm isincorporated into the NBLAST and XBLAST programs of Altschul et al.,1990, J. Mol. Biol. 215:403. BLAST nucleotide searches can be performedwith the NBLAST nucleotide program parameters set, e.g., for score=100,wordlength=12 to obtain nucleotide sequences homologous to a nucleicacid molecules of the present invention. BLAST protein searches can beperformed with the XBLAST program parameters set, e.g., to score-50,wordlength=3 to obtain amino acid sequences homologous to a proteinmolecule of the present invention. To obtain gapped alignments forcomparison purposes, Gapped BLAST can be utilized as described inAltschul et al., 1997, Nucleic Acids Res. 25:3389-3402. Alternatively,PSI-BLAST can be used to perform an iterated search which detectsdistant relationships between molecules (Id.). When utilizing BLAST,Gapped BLAST, and PSI-Blast programs, the default parameters of therespective programs (e.g., of XBLAST and NBLAST) can be used (see, e.g.,the NCBI website). Another preferred, non-limiting example of amathematical algorithm utilized for the comparison of sequences is thealgorithm of Myers and Miller, 1988, CABIOS 4:11-17. Such an algorithmis incorporated in the ALIGN program (version 2.0) which is part of theGCG sequence alignment software package. When utilizing the ALIGNprogram for comparing amino acid sequences, a PAM120 weight residuetable, a gap length penalty of 12, and a gap penalty of 4 can be used.

The percent identity between two sequences can be determined usingtechniques similar to those described above, with or without allowinggaps. In calculating percent identity, typically only exact matches arecounted.

As used herein, the term “analog” in the context of non-proteinaceousmolecules refers to a second organic or inorganic molecule whichpossesses a similar or identical function as a first organic orinorganic molecule and is structurally similar to the first organic orinorganic molecule. The term “analog” includes a molecule whose corestructure is the same as or closely resembles that of the firstmolecule, but which has a chemical or physical modification. The term“analog” includes copolymers of the first molecule that can be linked toother atoms or molecules. A “biologically active analog” and “analog”are used interchangeably herein to cover an organic or inorganicmolecule that exhibits substantially the same agonist or antagonisteffect of the first organic or inorganic molecule.

A “nucleotide analog”, as used herein, refers to a nucleotide in whichthe pentose sugar and/or one or more of the phosphate esters is replacedwith its respective analog. Exemplary phosphate ester analogs include,but are not limited to, alkylphosphonates, methylphosphonates,phosphoramidates, phosphotriesters, phosphorothioates,phosphorodithioates, phosphoroselenoates, phosphorodiselenoates,phosphoroanilothioates, phosphoroanilidates, phosphoroamidates,boronophosphates, etc., including any associated counterions, ifpresent. Also included within the definition of “nucleotide analog” arenucleobase monomers which can be polymerized into polynucleotide analogsin which the DNA/RNA phosphate ester and/or sugar phosphate esterbackbone is replaced with a different type of linkage. Further includedwithin “nucleotide analogs” are nucleotides in which the nucleobasemoiety is non-conventional, i.e., differs from one of G, A, T, U or C.Generally a non-conventional nucleobase will have the capacity to formhydrogen bonds with at least one nucleobase moiety present on anadjacent counter-directional polynucleotide strand or provide anon-interacting, non-interfering base.

The term “antibody” also encompasses antigen-binding fragments of anantibody. The term “antigen-binding fragment” of an antibody (or simply“antibody portion,” or “fragment”), as used herein, refers to one ormore fragments of a full-length antibody that retain the ability tospecifically bind to a polypeptide encoded by one of the genes of abiomarker of the invention. Examples of binding fragments encompassedwithin the term “antigen-binding fragment” of an antibody include (i) aFab fragment, a monovalent fragment consisting of the VL, VH, CL and CH1domains; (ii) a F(ab′)₂ fragment, a bivalent fragment comprising two Fabfragments linked by a disulfide bridge at the hinge region; (iii) a Fdfragment consisting of the VH and CH1 domains; (iv) a Fv fragmentconsisting of the VL and VH domains of a single arm of an antibody, (v)a dAb fragment (Ward et al., (1989) Nature 341:544-546), which consistsof a VH domain; and (vi) an isolated complementarity determining region(CDR). Furthermore, although the two domains of the Fv fragment, VL andVH, are coded for by separate genes, they can be joined, usingrecombinant methods, by a synthetic linker that enables them to be madeas a single protein chain in which the VL and VH regions pair to formmonovalent molecules (known as single chain Fv (scFv); see e.g., Bird etal. (1988) Science 242:423-426; and Huston et al. (1988) Proc. Natl.Acad. Sci. USA 85:5879-5883). Such single chain antibodies are alsointended to be encompassed within the term “antigen-binding fragment” ofan antibody. These antibody fragments are obtained using conventionaltechniques known to those with skill in the art, and the fragments arescreened for utility in the same manner as are intact antibodies. Theantibody is in some embodiments monospecific, e.g., a monoclonalantibody, or antigen-binding fragment thereof. The term “monospecificantibody” refers to an antibody that displays a single bindingspecificity and affinity for a particular target, e.g., epitope. Thisterm includes a “monoclonal antibody” or “monoclonal antibodycomposition,” which as used herein refer to a preparation of antibodiesor fragments thereof of single molecular composition.

As used herein, the terms “attaching” and “spotting” in relation to anarray can include a process of depositing or localizing a nucleic acidor proteinaceous agent onto a substrate to form a nucleic acid orprotein array. In one embodiment, the substance spotted is attached orlocalized onto the array via covalent bonds, hydrogen bonds or ionicinteractions.

As used herein, the term “biomarker” refers to a gene whose products canbe measured and correlated with disease. A biomarker refers to a genewhich encodes one or products (e.g., unspliced RNA, mRNA and/orpolypeptide) present at measurably different levels in correspondingsamples isolated and/or derived from subjects having the pathology andsubjects not having the pathology. A biomarker may be a DNA moleculewhich is transcribed into RNA product. Alternately, the biomarker may bean RNA molecule which is translated into protein product, orreverse-transcribed into DNA product.

As used herein, a “blood nucleic acid sample” or “blood polynucleotidesample”, refers to polynucleotides derived from blood and can includepolynucleotides isolated and/or derived from whole blood, serum-reducedwhole blood, lysed blood (erthyrocyte depleted blood), centrifuged lysedblood (serum-depleted, erythrocyte depleted blood), serum depleted wholeblood or peripheral blood leukocytes (PBLs), globin reduced RNA fromblood, or any possible fraction of blood as would be understood by aperson skilled in the art. A blood polynucleotide sample can refer toRNA, mRNA or a nucleic acid corresponding to mRNA, for example, cDNA orEST derived from RNA isolated from said sample. A blood polynucleotidesample can also include a PCR product derived from RNA, mRNA or cDNA.

As used herein, the term “formula” includes one or more classifiers, orcombination of classifiers where the term classifier is used to describethe output of a mathematical model.

As used herein the term “colorectal pathology” comprises any of one ormore types or subtypes of pathology of the rectum and/or colon.“Colorectal pathologies” include pre-cancerous polyps, cancerous polyps,polyps at risk of becoming cancerous, and polyps of unknowncancer-related status. As would be understood, in some cases a subjectaccording to embodiments of the invention can have at any one time oneor more colorectal pathologies, each of which being of the same or adifferent type or subtype of polyp. Colorectal pathologies may beclassified in any one of various ways, for example as is known in theart. In one embodiment, “polyp” or “colorectal polyp” as would beunderstood in the art, includes an abnormal growth of cells and/ortissue, and/or a growth of cells and/or tissue that may project into thecolon or rectum. A polyp can be further defined according to variousfactors including the morphology of the polyp; the risk of the polypdeveloping into a cancerous polyp and the like as would be understood bya person ordinarily skilled in the art. In one embodiment, polyps can beclassified into various subtypes including: Hyperplastic; TubularAdenoma; Villous Adenoma; Tubulovillous Adenoma; Hyperplasia; High GradeDysplasia; and Cancer. For any one individual and or polyp, one or morepolyp subtype description could apply. In another embodiment, (7)colorectal cancer can be subclassified into various categories as well.In yet another embodiment, one or more of the listed subtypes can begrouped together according to any one of various parameters in onecategory. Alternately, one or more of the listed subtypes can be furthersubclassified according to any one of various parameters. In yet anotherembodiment, one or more of the listed subtypes can be grouped togetheraccording to any one of various parameters in one category. Alternately,one or more of the listed subtypes can be further subclassifiedaccording to any one of various parameters. For example, in oneembodiment, Tubular Adenoma polyps can be further classified inaccordance with the diameter of the Adenoma polyp. For example Adenomapolyps with a diameter greater than 1 mm, 2 mm, 3 mm, 4 mm, 5 mm, 6 mm,7 mm, 8 mm, 9 mm, 10 mm, 11 mm, 12 mm, 13 mm, 14 mm or 15 mm arepossible. In yet another example, colorectal cancer can be furthersubclassified in accordance with disease progression as would beunderstood. For example, colorectal cancer can be subclassified usingthe Duke or Modified Duke Staging System. The Modified Duke StagingSystem groups colorectal cancer into four different stages A-D. Stage Aindicates the tumor penetrating the mucosa of the colon and/or bowel butno further. Stage B1 indicates the tumor penetrating into, but notthrough the muscularis propria (the muscular layer) of the colon and/orbowel wall. Stage B2 indicates the tumor has penetrated into and throughthe muscularis propria of the colon and/or bowel wall. A Stage C1 tumorpenetrates into, but not through the muscularis propria of the colonand/or bowel wall; there is pathologic evidence of colorectal cancer inthe lymph nodes. Stage C2 tumors penetrates into and through themuscularis propria of the bowel wall; but there is pathologic evidenceof colorectal cancer in the lymph nodes. Finally, Stage D indicates thetumor has spread beyond the lymph nodes to other organs. In yet anotherembodiment, colorectal cancer can be subclassified using the TNM stagingsystem. According to the TNM staging system there are four stages,stages I through IV, each reflecting status regarding Tumor, Node, andMetastasis. Tumor is subdivided as follows T1: Tumor invades submucosa,T2: Tumor invades muscularis propria, T3: Tumor invades through themuscularis propria into the subserosa, or into the pericolic orperirectal tissues and T4: Tumor directly invades other organs orstructures, and/or perforates. Node is subdivided as follows: N0indicates no regional lymph node metastasis. N1 indicates metastasis in1 to 3 regional lymph nodes. N2 indicates metastasis in 4 or moreregional lymph nodes. Metastasis is divided as follows: M0 indicates nodistant metastasis and M1 indicates distant metastasis present. Thus forStage I, in accordance with the TNM system, the tumor can either becategorized as T1N0M0 or T2N0M0; the cancer has begun to spread but isstill in the inner lining. Stage II is T3N0M0 or T4 N0M0; the cancer hasspread to other organs near the colon or rectum but has not yet reachedthe inner lining. Stage III includes all T's, N1-2 and M0; cancer hasspread to lymph nodes but has not been carried to distant parts of thebody. Stage IV includes any T, any N and M1; cancer is metastatic andhas been carried to other organs, likely the lung or liver.

As used herein, the term “high risk polyps” indicates those subtypes ofpolyps which are considered at higher risk for developing into cancer orare already cancerous as would be understood by a person skilled in theart, and includes cancer-prone polyps or cancer-disposed polyps andcancerous polyps, whereas a “low risk polyp” includes all other types ofpolyps. For example, 70 to 90 percent of colorectal cancers arise fromadenomatous polyps, and thus are considered high risk polyps.Adenomatous polyps can be further categorized into subtypes including:Tubular adenoma, which has been suggested to have approximately a 4%potential for malignancy; Tubulovillous adenoma, which has beensuggested to have approximately a 16% potential for malignancy andVillous adenoma, which has been suggested to have approximately a 21%potential for malignancy. In addition, high grade dysplasia hasincreased malignant potential. In one embodiment, polyps which are “highrisk polyps” are Tubulovillous Adenoma, Villous Adenoma, High GradeDysplasia and Tubular Adenoma and also includes polyps which arecancerous including those which are cancerous and localized and thosewhich have already led to dissemination in the peripheral blood. In thisembodiment, a “low risk polyp” includes any other polyp morphology. Inanother embodiment, polyps which are “high risk polyps” areTubulovillous Adenoma, Villous Adenoma, High Grade Dysplasia, andTubular Adenoma and do not include polyps which are already cancerous.The size of the polyp also correlates with the risk for developing intocancer. For example, polyps greater than 10 mm in diameter areconsidered large polyps and have a greater potential for malignancy.Polyps larger than 2 cm in diameter have a 50 percent chance of becomingmalignant. See Zauber (2004) Gastroenterology; 126(5): 1474. In anotherembodiment “high risk polyps” comprise Tubulovillous Adenoma, VillousAdenoma, High Grade Dysplasia, and Tubular Adenoma where the diameter ofthe Tubular Adenoma polyp is greater than 10 mm and the remaining polypmorphologies are considered “low risk polyps.”

As used herein, the terms “compound” and “agent” are usedinterchangeably.

As used herein, the term “control” or “control sample” can include oneor more samples isolated and/or derived from a subject or group ofsubjects who have been diagnosed as having one or more colorectalpathologies, including having one or more polyps or having one or moresubtypes of polyps; not having colorectal pathologies; not havingpolyps; or not having one or more subtypes of polyps. The term controlor control sample can also refer to the compilation of data derived fromone or more samples of one or more subjects.

A “coding region” in reference to a DNA refers to DNA which encodes RNA.

A “coding region” in reference to RNA refers to RNA which encodesprotein.

As used herein, the term “data” in relation to one or more biomarkers,or the term “biomarker data” generally refers to data reflective of theabsolute and/or relative abundance (level) of a product of a biomarkerin a sample. As used herein, the term “dataset” in relation to one ormore biomarkers refers to a set of data representing levels of each ofone or more biomarker products of a panel of biomarkers in a referencepopulation of subjects. A dataset can be used to generate aformula/classifier of the invention. According to one embodiment thedataset need not comprise data for each biomarker product of the panelfor each individual of the reference population. For example, the“dataset” when used in the context of a dataset to be applied to aformula can refer to data representing levels of products of eachbiomarker for each individual in one or more reference populations, butas would be understood can also refer to data representing levels ofproducts of each biomarker for 99%, 95%, 90%, 85%, 80%, 75%, 70% or lessof the individuals in each of said one or more reference populations andcan still be useful for purposes of applying to a formula.

As used herein, the term “derivative” in the context of proteinaceousagent (e.g., proteins, polypeptides, peptides, and antibodies) refers toa proteinaceous agent that comprises an amino acid sequence which hasbeen altered by the introduction of amino acid residue substitutions,deletions, and/or additions. The term “derivative” as used herein alsorefers to a proteinaceous agent which has been modified, i.e., by thecovalent attachment of any type of molecule to the proteinaceous agent.For example, but not by way of limitation, an antibody may be modified,e.g., by glycosylation, acetylation, pegylation, phosphorylation,amidation, derivatization by known protecting/blocking groups,proteolytic cleavage, linkage to a cellular ligand or other protein,etc. A derivative of a proteinaceous agent may be produced by chemicalmodifications using techniques known to those of skill in the art,including, but not limited to specific chemical cleavage, acetylation,formylation, metabolic synthesis of tunicamycin, etc. Further, aderivative of a proteinaceous agent may contain one or morenon-classical amino acids. A derivative of a proteinaceous agentpossesses a similar or identical function as the proteinaceous agentfrom which it was derived.

As used herein, the term “derivative” in the context of anon-proteinaceous derivative refers to a second organic or inorganicmolecule that is formed based upon the structure of a first organic orinorganic molecule. A derivative of an organic molecule includes, but isnot limited to, a molecule modified, e.g., by the addition or deletionof a hydroxyl, methyl, ethyl, carboxyl or amine group. An organicmolecule may also be esterified, alkylated and/or phosphorylated.

As used herein the terms “testing”, “diagnosis” and “screening”, inrelation to colorectal pathologies refer to a process of determining alikelihood (probability) of a test subject having one or more colorectalpathologies and includes both traditional medical diagnostic techniquesas well as testing methods as encompassed by one or more aspects of thecurrent invention. Traditional medical diagnostic techniques for testingfor colorectal pathology includes physical exam and history, medicalevaluation, and appropriate laboratory tests which can include FOBT,sigmoidoscopy and colonoscopy. In one embodiment, “diagnosis ofcolorectal pathology” refers to a determination as between two options:e.g., (i) that an individual has colorectal pathology or one or moresubtypes of colorectal pathology, or one or more polyps and (ii) that anindividual does not have the colorectal pathology or the one or morepolyps or the one or more subtypes of polyps. In another embodimentdiagnosis can include an option that it cannot be determined withsufficient degree of certainty as to whether an individual can becharacterized as having colorectal pathology or not. In one context, a“sufficient degree of certainty” takes into account any limitations—suchas limitations in the technology, equipment or measuring where as aresult of the limitations, the result is within a range which suggeststhat the test is indeterminate. The range which suggests the test isindeterminate will depend upon the specific limitations of theequipment, reagents and technology used. In another context, “sufficientdegree of certainty” depends upon the medical requirements for thesensitivity and/or specificity of the test. More particularly thesufficient degree of certainty includes greater than 50% sensitivityand/or specificity, greater than 60% sensitivity and/or specificity,greater than 70% sensitivity and/or specificity, greater than 80%sensitivity and/or specificity, greater than 90% sensitivity and/orspecificity and 100% sensitivity and/or specificity.

As used herein, “normal” refers to an individual or group of individualswho do not have colorectal pathology. In some embodiments, the diagnosisof said individual or group of individuals not having colorectalpathology is determined using conventional diagnostic methods. In someembodiments, said individual or group of individuals has not beendiagnosed with any other disease. “Normal,” according to the invention,also refers to samples isolated from normal individuals and includesblood, total RNA or mRNA isolated from normal individuals. A sampletaken from a normal individual can include a sample taken from anindividual who does not have colorectal pathology at the time the sampleis taken.

As used herein, the term “differential expression” refers to adifference in the level of expression of the products of one or morebiomarkers. For instance, the term “differential expression” can referto the difference in the level of RNA of one or more biomarkers betweensamples from subjects having and subjects not having one or morecolorectal pathologies. Differences in biomarker RNA product levels canbe determined by directly or indirectly measuring the amount or level ofRNA corresponding to the biomarkers. “Differentially expressed” can alsoinclude different levels of protein encoded by the biomarker of theinvention between samples or reference populations. Differentialexpression can be determined as the ratio of the levels of one or morebiomarker products between reference subjects/populations having or nothaving one or more colorectal pathologies, wherein the ratio is notequal to 1.0. Differential expression between populations can bedetermined to be statistically significant as a function of p-value.When using p-value to determine statistical significance, a biomarker,the p-value is preferably less than 0.2. In another embodiment thebiomarker is identified as being differentially expressed when thep-value is less than 0.15, 0.1, 0.05, 0.01, 0.005, 0.0001 etc. Whendetermining differential expression on the basis of the ratio, abiomarker product is differentially expressed if the ratio of the levelof expression in a first sample as compared with a second sample isgreater than or less than 1.0. For example, a ratio of greater than 1.0for example includes a ratio of greater than 1.1, 1.2, 1.5, 1.7, 2, 3,4, 10, 20, and the like. A ratio of less than 1.0, for example, includesa ratio of less than 0.9, 0.8, 0.6, 0.4, 0.2, 0.1, 0.05, and the like.In another embodiment of the invention a biomarker product isdifferentially expressed if the ratio of the mean of the level ofexpression of a first population as compared with the mean level ofexpression of the second population is greater than or less than 1.0.For example, a ratio of greater than 1.0 includes a ratio of greaterthan 1.1, 1.2, 1.5, 1.7, 2, 3, 4, 10, 20, and the like and a ratio lessthan 1.0, for example includes a ration of less than 0.9, 0.8, 0.6, 0.4,0.2, 0.1, 0.05, and the like. In another embodiment of the invention abiomarker product is differentially expressed if the ratio of its levelof expression in a first sample as compared with the mean of the secondpopulation is greater than or less than 1.0 and includes for example, aratio of greater than 1.1, 1.2, 1.5, 1.7, 2, 3, 4, 10, 20, or a ratioless than 1, for example 0.9, 0.8, 0.6, 0.4, 0.2, 0.1, 0.05.

“Differentially increased expression” or “up regulation” refers tobiomarker product levels which are at least 10% or more, for example,20%, 30%, 40%, or 50%, 60%, 70%, 80%, 90% higher or more, and/or 1.1fold, 1.2 fold, 1.4 fold, 1.6 fold, 1.8 fold higher or more, than acontrol.

“Differentially decreased expression” or “down regulation” refers tobiomarker product levels which are at least 10% or more, for example,20%, 30%, 40%, or 50%, 60%, 70%, 80%, 90% lower or less, and/or 0.9fold, 0.8 fold, 0.6 fold, 0.4 fold, 0.2 fold, 0.1 fold or less lowerthan a control.

For example, up regulated or down regulated genes include genes havingan increased or decreased level, respectively, of expression of product(e.g., mRNA or protein) in blood isolated from individuals characterizedas having one or more colorectal pathologies as compared with normalindividuals. In another example, up regulated or down regulated genesinclude genes having an increased or decreased level, respectively, ofexpression of product (e.g., mRNA or protein) in blood isolated fromindividuals having one type of colorectal pathology or collection ofcolorectal pathologies as compared with individuals having a differenttype of colorectal pathology or collection of colorectal pathologies,respectively.

For example, up regulated genes include genes having an increased levelof biomarker products in a test sample as compared with a controlsample.

As used herein, the term “differential hybridization” refers to adifference in a quantitative level of hybridization of a nucleic acid orderivative thereof isolated and/or derived from a sample from a firstindividual or individuals with a trait to a complementary nucleic acidtarget as compared with the hybridization of a nucleic acid orderivative thereof isolated and/or derived from a second individual orindividuals not having said trait to a complementary nucleic acidtarget. A “differential hybridization” means that the ratio of the levelof hybridization of the first sample as compared with the second sampleis not equal to 1.0. For example, the ratio of the level ofhybridization of the first sample to the target as compared to thesecond sample is greater than 1.1, 1.2, 1.5, 1.7, 2, 3, 4, 10, 20, orless than 1, for example 0.9, 0.8, 0.6, 0.4, 0.2, 0.1, 0.05. Adifferential hybridization also exists if the hybridization isdetectable in one sample but not another sample.

As used herein, the term “drug efficacy” refers to the effectiveness ofa drug. “Drug efficacy” is usually measured by the clinical response ofthe patient who has been or is being treated with a drug. A drug isconsidered to have a high degree of efficacy, if it achieves desiredclinical results, for example, the alteration of gene expression and thegene expression pattern reflective of one or more colorectal pathologiesas described herein. The amount of drug absorbed may be used to predicta patient's response. A general rule is that as the dose of a drug isincreased, a greater effect is seen in the patient until a maximumdesired effect is reached. If more drug is administered after themaximum point is reached, the side effects will normally increase.

As used herein, the term “effective amount” refers to the amount of acompound which is sufficient to reduce or prevent the progression and/orseverity of one or more colorectal pathologies; prevent the development,recurrence of onset of one or more colorectal pathologies; or enhance orimprove the prophylactic or therapeutic effect(s) of another therapy.

As used herein, the term “fragment” in the context of a proteinaceousagent refers to a peptide or polypeptide comprising an amino acidsequence of at least 5 contiguous amino acid residues, at least 10contiguous amino acid residues, at least 15 contiguous amino acidresidues, at least 20 contiguous amino acid residues, at least 25contiguous amino acid residues, at least 40 contiguous amino acidresidues, at least 50 contiguous amino acid residues, at least 60contiguous amino residues, at least 70 contiguous amino acid residues,at least contiguous 80 amino acid residues, at least contiguous 90 aminoacid residues, at least contiguous 100 amino acid residues, at leastcontiguous 125 amino acid residues, at least 150 contiguous amino acidresidues, at least contiguous 175 amino acid residues, at leastcontiguous 200 amino acid residues, or at least contiguous 250 aminoacid residues of the amino acid sequence of another polypeptide or aprotein. In a specific embodiment, a fragment of a protein orpolypeptide retains at least one function of the protein or polypeptide.In another embodiment, a fragment of a protein or polypeptide retains atleast two, three, four, or five functions of the protein or polypeptide.In some embodiments, a fragment of an antibody retains the ability toimmunospecifically bind to an antigen.

As used herein, the term “fusion protein” refers to a polypeptide thatcomprises an amino acid sequence of a first protein or polypeptide orfunctional fragment, analog or derivative thereof, and an amino acidsequence of a heterologous protein, polypeptide, or peptide (i.e., asecond protein or polypeptide or fragment, analog or derivative thereofdifferent than the first protein or fragment, analog or derivativethereof). In one embodiment, a fusion protein comprises a prophylacticor therapeutic agent fused to a heterologous protein, polypeptide orpeptide. In accordance with this embodiment, the heterologous protein,polypeptide or peptide may or may not be a different type ofprophylactic or therapeutic agent.

As used herein, a “gene” of the invention can include a gene expressedin blood, a gene expressed in blood and in a non-blood tissue, a genedifferentially expressed in blood, a gene expressed in a non-blood cell,a gene expressed in a cell which is not of haematopoietic origin, a geneexpressed in a specific subtype of cell found in blood includinglymphocytes, granulocytes, leukocytes, basophils and the like. A genecan be an immune response gene or a gene not involved in an immuneresponse. In particular an immune response gene is a gene in the majorhistocompatibility complex that controls a cells response to a foreignantigen. A gene of the invention can also include a gene which isdifferentially regulated in response to a foreign antigen introducedinto peripheral blood.

As used herein, a “gene expression pattern” or “gene expression profile”indicates the pattern of the level of expression of two or morebiomarkers of the invention including 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18 or more or all of the biomarkers of theinvention. A gene expression pattern or gene expression profile can bedetermined from the measurement of expression levels of the products ofthe biomarkers of the invention using any known technique. For exampletechniques to measure expression of the RNA products of the biomarkersof the invention include, PCR based methods (including reversetranscription-PCR, PCR, QRT-PCR) and non PCR based method, as well asmicroarray analysis. To measure levels of protein products of thebiomarkers of the invention, techniques include densitometric westernblotting and ELISA analysis.

As used herein, the term “hybridizing to” or “hybridization” refers tothe sequence specific non-covalent binding interactions with acomplementary nucleic acid, for example interactions between a targetnucleic acid sequence and a nucleic acid member on an array.

As used herein, the term “immunoglobulin” refers to a protein consistingof one or more polypeptides substantially encoded by immunoglobulingenes. The recognized human immunoglobulin genes include the kappa,lambda, alpha (IgA1 and IgA2), gamma (IgG1, IgG2, IgG3, IgG4), delta,epsilon and mu constant region genes, as well as the myriadimmunoglobulin variable region genes. Full-length immunoglobulin “lightchains” (about 25 Kd or 214 amino acids) are encoded by a variableregion gene at the NH2-terminus (about 110 amino acids) and a kappa orlambda constant region gene at the COOH-terminus. Full-lengthimmunoglobulin “heavy chains” (about 50 Kd or 446 amino acids), aresimilarly encoded by a variable region gene (about 116 amino acids) andone of the other aforementioned constant region genes, e.g., gamma(encoding about 330 amino acids).

As used herein, the term “in combination” when referring to therapeutictreatments refers to the use of more than one type of therapy (e.g.,more than one prophylactic agent and/or therapeutic agent). The use ofthe term “in combination” does not restrict the order in which therapies(e.g., prophylactic and/or therapeutic agents) are administered to asubject. A first therapy (e.g., a first prophylactic or therapeuticagent) can be administered prior to (e.g., 5 minutes, 15 minutes, 30minutes, 45 minutes, 1 hour, 2 hours, 4 hours, 6 hours, 12 hours, 24hours, 48 hours, 72 hours, 96 hours, 1 week, 2 weeks, 3 weeks, 4 weeks,5 weeks, 6 weeks, 8 weeks, or 12 weeks before), concomitantly with, orsubsequent to (e.g., 5 minutes, 15 minutes, 30 minutes, 45 minutes, 1hour, 2 hours, 4 hours, 6 hours, 12 hours, 24 hours, 48 hours, 72 hours,96 hours, 1 week, 2 weeks, 3 weeks, 4 weeks, 5 weeks, 6 weeks, 8 weeks,or 12 weeks after) the administration of a second therapy (e.g., asecond prophylactic or therapeutic agent) to a subject.

As used herein, “indicative of one or more colorectal pathologies”refers to a determination of a probability that a subject has or willhave the one or more colorectal pathologies. In one aspect theapplication of a formula to data corresponding to biomarker products ofa test subject can result in determination of the probability of whetherthe test subject has one or more colorectal pathologies as compared withnot having said one or more colorectal pathologies. In anotherembodiment, an expression pattern can be indicative of one or morecolorectal pathologies including one or more polyps or one or moresubtypes of polyps if the expression pattern is found significantly moreoften in patients with said colorectal pathology than in patientswithout said colorectal pathology (as determined using routinestatistical methods setting confidence levels at a minimum of 70%, 75%,80%, 85%, 90%, 95% and the like). In some embodiments, an expressionpattern which is indicative of disease is found in at least 70%, atleast 75%, at least 80%, at least 85%, at least 90%, at least 95% ormore in patients who have the disease and is found in less than 10%,less than 8%, less than 5%, less than 2.5%, or less than 1% of patientswho do not have the disease. “Indicative of colorectal pathology” canalso indicates an expression pattern which more properly categorizeswith control expression patterns of individuals with the one or morecolorectal pathologies as compared with control expression patterns ofindividuals without the one or more colorectal pathologies usingstatistical algorithms for class prediction as would be understood by aperson skilled in the art and see for example commercially availableprograms such as those provided by Silicon Genetics (e.g. Gene Spring™)

As used herein, “isolated” or “purified” when used in reference to anucleic acid means that a naturally occurring sequence has been removedfrom its normal cellular (e.g., chromosomal) environment or issynthesized in a non-natural environment (e.g., artificiallysynthesized). Thus, an “isolated” or “purified” sequence may be in acell-free solution or placed in a different cellular environment. Theterm “purified” does not imply that the sequence is the only nucleotidepresent, but that it is essentially free (about 90-95% pure) ofnon-nucleotide material naturally associated with it, and thus isdistinguished from isolated chromosomes.

As used herein, the terms “isolated” and “purified” in the context of aproteinaceous agent (e.g., a peptide, polypeptide, protein or antibody)refer to a proteinaceous agent which is substantially free of cellularmaterial and in some embodiments, substantially free of heterologousproteinaceous agents (i.e., contaminating proteins) from the cell ortissue source from which it is derived, or substantially free ofchemical precursors or other chemicals when chemically synthesized. Thelanguage “substantially free of cellular material” includes preparationsof a proteinaceous agent in which the proteinaceous agent is separatedfrom cellular components of the cells from which it is isolated orrecombinantly produced. Thus, a proteinaceous agent that issubstantially free of cellular material includes preparations of aproteinaceous agent having less than about 40%, 30%, 20%, 10%, or 5% (bydry weight) of heterologous proteinaceous agent (e.g., protein,polypeptide, peptide, or antibody; also referred to as a “contaminatingprotein”). When the proteinaceous agent is recombinantly produced, it isalso in some embodiments substantially free of culture medium, i.e.,culture medium represents less than about 20%, 10%, or 5% of the volumeof the protein preparation. When the proteinaceous agent is produced bychemical synthesis, it is in some embodiments substantially free ofchemical precursors or other chemicals, i.e., it is separated fromchemical precursors or other chemicals which are involved in thesynthesis of the proteinaceous agent. Accordingly, such preparations ofa proteinaceous agent have less than about 30%, 20%, 10%, 5% (by dryweight) of chemical precursors or compounds other than the proteinaceousagent of interest. In some embodiments, proteinaceous agents disclosedherein are isolated.

As used herein, a sample which is “isolated and/or derived” includes asample which has been removed it from its natural environment in asubject and also includes samples which are further modified or altered.For example samples can include tissue, lymph, bodily fluid, blood, RNA,protein, mRNA, serum reduced blood, erythrocyte reduced blood, serumreduced and erythrocyte reduced blood, unfractionated cells of a lysedblood, globin reduced mRNA, cDNA, PCR products and the like.

As used herein, the term “level” or “level of expression” when referringto RNA refers to a measurable quantity (either absolute or relativequantity) of a given nucleic acid as determined by hybridization ormeasurements such as QRT-PCR and includes use of both SYBR® green andTaqMan® technology and which corresponds in direct proportion with theamount of product of the gene in a sample. Level of expression whenreferring to RNA can also refer to a measurable quantity of a givennucleic acid as determined by PCR wherein the number of cycles of PCR islimited to 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 cycles. Thelevel of expression when referring to RNA can also refer to a measurablequantity of a given nucleic acid as determined relative to the amount oftotal RNA, or cDNA used in QRT-PCR wherein the amount of total RNA usedis 100 ng; 50 ng, 25 ng; 10 ng; 5 ng; 1.25 ng; 0.05 ng; 0.3 ng; 0.1 ng;0.09 ng; 0.08 ng; 0.07 ng; 0.06 ng; or 0.05 ng. The level of expressionof a nucleic acid can be determined by any methods known in the art. Formicroarray analysis, the level of expression is measured byhybridization analysis using nucleic acids corresponding to RNA isolatedfrom one or more individuals according to methods well known in the art.The label can either be incorporated into the RNA or used in anothermanner as would be understood so as to monitor hybridization. The labelused can be a luminescent label, an enzymatic label, a radioactivelabel, a chemical label or a physical label. In some embodiments, targetand/or probe nucleic acids are labeled with a fluorescent molecule.Preferred fluorescent labels include, but are not limited to:fluorescein, amino coumarin acetic acid, tetramethylrhodamineisothiocyanate (TRITC), Texas Red, Cyanine 3 (Cy3) and Cyanine 5 (Cy5).The level of expression when referring to RNA can also refer to ameasurable quantity of a given nucleic acid as determined relative tothe amount of total RNA or cDNA used in microarray hybridizationswherein the amount of total RNA is 10 μg, 5 μg, 2.5 μg; 2 μg; 1 μg; 0.5μg; 0.1 μg; 0.05 μg; 0.01 μg; 0.005 μg; 0.001 μg and the like.

As used herein, a “ligand” is a molecule that binds to another.“Polynucleotide Ligands” are those that specifically and/or selectivelyhybridize to products of the biomarkers and/or derivatives thereof.Polynucleotide ligands can specifically and/or selectively hybridize toRNA and/or protein products of the biomarkers, allowing measurement ofthe levels of the biomarker products. The polynucleotide ligands may beany of various types of molecule, including but not limited to, any ofvarious combinations of oligonucleotides, cDNA, DNA, RNA, PCR products,synthetic DNA, synthetic RNA, and/or modified nucleotides.

A ligand of the invention can also include a “polypeptide ligand” thatspecifically or selectively binds to the biomarker products, forexample, allowing detection or measurement of the level of biomarkerproducts including either RNA products and/or protein products. Apolypeptide ligand may include a scaffold peptide, a linear peptide, ora cyclic peptide. In a preferred embodiment, the polypeptide ligand isan antibody. The antibody can be a human antibody, a chimeric antibody,a recombinant antibody, a humanized antibody, a monoclonal antibody, ora polyclonal antibody. The antibody can be an intact immunoglobulin,e.g., an IgA, IgG, IgE, IgD, IgM or subtypes thereof. The antibody canbe conjugated to a functional moiety (e.g., a compound which has abiological or chemical function (which may be a second differentpolypeptide, a therapeutic drug, a cytotoxic agent, a detectable moiety,or a support. A polypeptide ligand, e.g., antibody of the inventioninteracts with a polypeptide, encoded by one of the genes of abiomarker, with high affinity and specificity. For example, thepolypeptide ligand binds to a polypeptide, encoded by one of the genesof a biomarker, with an affinity constant of at least 10⁷ M⁻¹,preferably, at least 10⁸ M⁻¹, 10⁹ M⁻¹, or 10¹⁰ M⁻¹. The polynucleotideligands and protein ligands may be used, according to standard artknowledge, to practice techniques such as Western blotting,immunoprecipitation, enzyme-linked immunosorbent assay (ELISA), proteinmicroarray analysis and the like to measure the level of disclosedbiomarker protein products.

An “mRNA” means an RNA complementary to a gene; an mRNA includes aprotein coding region, and also may include 5′ end and 3′ untranslatedregions (UTR).

As used herein, the term “majority” refers to a number representing morethan 50% (e.g., 51%, 60%, or 70%, or 80% or 90% or up to 100%) of thetotal members of a composition. The term “majority”, when referring toan array, it means more than 50% (e.g., 51%, 60%, or 70%, or 80% or 90%or up to 100%) of the total nucleic acid members that are stablyassociated with the solid substrate of the array.

Treatment of one or more colorectal pathologies or one or more subtypesof colorectal pathology is defined herein to provide medical aid tocounteract the disease itself, the symptoms and/or the progression ofthe disease. Treatments also include removing the one or more colorectalpathologies and include palliative therapy to help relieve symptoms andimprove quality of life. Treatments also include reducing or preventingpolyp formation, reducing or preventing polyp differentiation ormorphology changes, and can also include development, recurrence andonset.

As used herein, “mRNA integrity” refers to the quality of mRNA extractsfrom either tissue samples or samples. In one embodiment, mRNA extractswith good integrity do not appear to be degraded when examined bymethods well known in the art, for example, by RNA agarose gelelectrophoresis (e.g., Ausubel et al., John Wiley & Sons, Inc., 1997,Current Protocols in Molecular Biology). Preferably, the mRNA sampleshave good integrity (e.g., less than 10%, in some embodiments less than5%, and more in some embodiments less than 1% of the mRNA is degraded)to truly represent the gene expression levels of sample from which theyare extracted.

As used herein, “nucleic acid(s)” and “nucleic acid molecule(s)” areinterchangeable with the term “polynucleotide(s)” and it generallyrefers to any polyribonucleotide or poly-deoxyribonucleotide, which maybe unmodified RNA or DNA or modified RNA or DNA or any combinationthereof “Nucleic acids” include, without limitation, single- anddouble-stranded nucleic acids. As used herein, the term “nucleicacid(s)” also includes DNAs or RNAs as described above that contain oneor more modified bases. Thus, DNAs or RNAs with backbones modified forstability or for other reasons are “nucleic acids.” The term “nucleicacids” as it is used herein embraces such chemically, enzymatically ormetabolically modified forms of nucleic acids, as well as the chemicalforms of DNA and RNA characteristic of viruses and cells, including forexample, simple and complex cells. A “nucleic acid” or “nucleic acidsequence” may also include regions of single- or double-stranded RNA orDNA or any combinations thereof and can include expressed sequence tags(ESTs) according to some embodiments of the invention. An EST is aportion of the expressed sequence of a gene (i.e., the “tag” of asequence), made by reverse transcribing a region of mRNA so as to makecDNA.

As defined herein, a “nucleic acid array” refers a plurality of nucleicacids (or “nucleic acid members”) localized on a support where each ofthe nucleic acid members is localized to a unique pre-selected region ofa support. In one embodiment, a nucleic acid member is attached to thesurface of the support and the nucleic acid member is DNA. In anotherembodiment, the nucleic acid member is either cDNA or andoligonucleotide. In another embodiment, the nucleic acid memberlocalized on the support is cDNA synthesized by polymerase chainreaction (PCR). The term “nucleic acid”, as used herein, isinterchangeable with the term “polynucleotide.” In another preferredembodiment, a “nucleic acid array” refers to a plurality of uniquenucleic acids attached to nitrocellulose or other membranes used inSouthern and/or Northern blotting techniques.

As used herein “nucleic acid sample for hybridization to an array” isdefined as a nucleic acid isolated and/or derived from a sample capableof binding to a nucleic acid bound to an array of complementary sequencethrough sets of non-covalent bonding interactions includingcomplementary base pairing interactions. The nucleic acid sample forhybridization to an array can either be an isolated nucleic acidsequence corresponding to a gene or portion thereof, total RNA or mRNAisolated from a sample. In one embodiment, the nucleic acid sample forhybridization to an array is a blood nucleic acid sample (includingwhole blood, lysed blood, serum reduced, erythrocyte reduced blood, orperipheral blood leukocytes (PBLs)). In some embodiments, the nucleicacid sample is single- or double-stranded DNA, RNA, or DNA-RNA hybrids,from human blood and in some embodiments from RNA or mRNA extracts.

As used herein, a “nucleic acid member on an array” or a “nucleic acidmember” includes nucleic acid immobilized on an array and capable ofbinding to a nucleic acid probes or samples of complementary sequencethrough sets of non-covalent bonding interactions, includingcomplementary base pairing interactions. As used herein, a nucleic acidmember or target may include natural (i.e., A, G, C, or T) or modifiedbases (7-deazaguanosine, inosine, etc.). In addition, the bases innucleic acids may be joined by a linkage other than a phosphodiesterbond, so long as it does not interfere with hybridization (i.e., thenucleic acid target still specifically binds to its complementarysequence under standard stringent or selective hybridizationconditions). Thus, nucleic acid members may be peptide nucleic acids inwhich the constituent bases are joined by peptide bonds rather thanphosphodiester linkages. In one embodiment, a conventional nucleic acidarray of ‘target’ sequences bound to the array can be representative ofthe entire human genome, e.g., Affymetrix chip, and the biomarker orisolated biomarker consisting of or comprising one or more of the genesset out in Table 1, Table 2, or Table 11, or Table 12 or gene probes(e.g., Table 4) is applied to the conventional array. In anotherembodiment, sequences bound to the array can be the biomarker orisolated biomarker according to the invention and total cellular RNA isapplied to the array.

As used herein, the term “oligonucleotide” is defined as a moleculecomprised of two or more deoxyribonucleotides and/or ribonucleotides,and preferably more than three. Its exact size will depend upon manyfactors which, in turn, depend upon the ultimate function and use of theoligonucleotide. The oligonucleotides may be from about 8 to about 1,000nucleotides long. Although oligonucleotides of 8 to 100 nucleotides areuseful in the invention, preferred oligonucleotides range from about 8to about 15 bases in length, from about 8 to about 20 bases in length,from about 8 to about 25 bases in length, from about 8 to about 30 basesin length, from about 8 to about 40 bases in length or from about 8 toabout 50 bases in length.

As used herein, “patient” or “individual” or “subject” refers to amammal and is in some embodiments human.

As used herein the term “peptide” refers to a polypeptide which is 50amino acids in length or less.

As used herein, the phrase “pharmaceutically acceptable salt(s)”includes, but is not limited to, salts of acidic or basic groups thatmay be present in compounds identified using the methods of the presentinvention. Compounds that are basic in nature are capable of forming awide variety of salts with various inorganic and organic acids. Theacids that can be used to prepare pharmaceutically acceptable acidaddition salts of such basic compounds are those that form non-toxicacid addition salts, i.e., salts containing pharmacologically acceptableanions, including but not limited to sulfuric, citric, maleic, acetic,oxalic, hydrochloride, hydrobromide, hydroiodide, nitrate, sulfate,bisulfate, phosphate, acid phosphate, isonicotinate, acetate, lactate,salicylate, citrate, acid citrate, tartrate, oleate, tannate,pantothenate, bitartrate, ascorbate, succinate, maleate, gentisinate,fumarate, gluconate, glucaronate, saccharate, formate, benzoate,glutamate, methanesulfonate, ethanesulfonate, benzenesulfonate,p-toluenesulfonate and pamoate (i.e.,1,1′-methylene-bis-(2-hydroxy-3-naphthoate)) salts. Compounds thatinclude an amino moiety may form pharmaceutically acceptable salts withvarious amino acids, in addition to the acids mentioned above. Compoundsthat are acidic in nature are capable of forming base salts with variouspharmacologically acceptable cations. Examples of such salts includealkali metal or alkaline earth metal salts and, particularly, calcium,magnesium, sodium lithium, zinc, potassium, and iron salts.

As used herein, “polynucleotide” encompasses single and double strandedpolynucleotides, such as double-stranded DNA, single-stranded DNA,double-stranded RNA, single-stranded RNA or DNA-RNA double strandedhybrids, and the like, of more than 8 nucleotides in length. The term“polynucleotide” includes a polymeric form of nucleotides of any length,either ribonucleotides or deoxyribonucleotides, that comprise purine andpyrimidine bases, or other natural, chemically or biochemicallymodified, non-natural, or derivatized nucleotide bases. The backbone ofthe polynucleotide can comprise sugars and phosphate groups, as maytypically be found in RNA or DNA, or modified or substituted sugar orphosphate groups. A polynucleotide may comprise modified nucleotides,such as methylated nucleotides and nucleotide analogs. The sequence ofnucleotides may be interrupted by non-nucleotide components.

As used herein a “polynucleotide ligand that specifically and/orselectively hybridizes to RNA products of the biomarkers” (“biomarkerRNA products”) and/or to polynucleotides corresponding to biomarker RNAproducts, allowing measurement of levels of the RNA products aredisclosed.

The polynucleotide ligands may be any of various types of molecule,including but not limited to, any of various combinations ofoligonucleotides, cDNA, DNA, RNA, PCR products, synthetic DNA, syntheticRNA, and/or modified nucleotides.

As used herein, the term “proteinaceous agent” refers to polypeptides,proteins, peptides, and the like.

As used herein, “polypeptide sequences encoded by or protein productsof” refers to the amino acid sequences obtained after translation of theprotein coding region of an mRNA transcribed from a gene. As would beunderstood, one or more mRNA nucleotide sequences for each of the genes(biomarkers) of the invention can be identified using public databasessuch as the NCBI database found at http://www.ncbi.nlm.nih.gov. Forexample, representative mRNA species of those biomarkers identified inTable 2 and Table 12 are provided by their Human Genbank Accessionnumber (see Table 3 and Table 13 respectively) and the correspondingpolypeptide sequence is identified by a Protein Accession number (seeTable 3 and Table 13 respectively). These Genbank Accession numbersprovide the sequence of products of the biomarkers. When a protein orfragment of a protein is used to immunize a host animal, numerousregions of the protein may induce the production of antibodies whichbind specifically to a given region or three-dimensional structure onthe protein; these regions or structures are referred to as epitopes orantigenic determinants. As used herein, “antigenic fragments” refersportions of a polypeptide that contains one or more epitopes. Epitopescan be linear, comprising essentially a linear sequence from theantigen, or conformational, comprising sequences which are geneticallyseparated by other sequences but come together structurally at thebinding site for the polypeptide ligand. “Antigenic fragments” may be5000, 1000, 500, 400, 300, 200, 100, 50 or 25 or 20 or 10 or 5 aminoacids in length.

As used herein, the terms “prevent,” “preventing,” and “prevention”refer to the prevention of the development, recurrence or formation orgrowth or transformation of colorectal pathology including polyps orsubtypes of polyps resulting from the administration of one or morecompounds identified in accordance the methods of the invention or theadministration of a combination of such a compound and another therapy.

The term, “primer”, as used herein refers to an oligonucleotide, whetheroccurring naturally as in a purified restriction digest or producedsynthetically, which is capable of acting as a point of initiation ofsynthesis when placed under conditions in which synthesis of a primerextension product, which is complementary to a nucleic acid strand, isinduced, i.e., in the presence of nucleotides and an inducing agent suchas a DNA polymerase and at a suitable temperature and pH. The primer maybe either single-stranded or double-stranded and must be sufficientlylong to prime the synthesis of the desired extension product in thepresence of the inducing agent. The exact length of the primer willdepend upon many factors, including temperature, source of primer andthe method used and also the specificity or selectivity of the desiredpriming (i.e., so as to act as a point of initiation of synthesis whichis specific or selective for a given sequence of polynucleotide). Forexample, for testing applications, depending on the complexity of thetarget sequence, the oligonucleotide primer typically contains 15-25,but may contain additional nucleotides as well as fewer nucleotides. Inaddition, in some cases the primer can be selected so as to have a highGC content, can be selected so as to bind to regions which do notcontain SNPs, can be selected so as to span an intron/exon junction ofRNA and the like. Other factors involved in determining the appropriatelength and/or features of a primer are readily known to one of ordinaryskill in the art.

The term “biomarker specific set of primers” or “primer sets” as usedherein refers to a set of polynucleotide primers wherein one primerprimes the synthesis of a sense strand, and the second primer primes thesynthesis of an antisense strand so as to produce double stranded DNAcomplementary to a portion of one or more RNA products of the biomarkerof the invention. For example, the primers can include a first primerwhich is a sequence that can selectively hybridize to RNA, cDNA or ESTcomplementary to a region of the biomarker of the invention to create anextension product and a second primer capable of selectively hybridizingto the extension product, which are used to produce double stranded DNAcomplementary to a region of the biomarker of the invention or productsof the biomarker of the invention. The invention includes primers usefulfor measuring the level of RNA products of a biomarker. Table 4, Table6, Table 14, Table 16, and Table 17 provide representative species ofprimers of the invention. A biomarker specific set of primers can beselected so that they will selectively amplify only portions of apolynucleotide complementary to one or more RNA products of a biomarkerand do not amplify portions of polynucleotides complementary to otherbiomarkers.

As used herein, the term “probe” means oligonucleotides and analogsthereof and refers to a range of chemical species that recognizepolynucleotide target sequences through hydrogen bonding interactionswith the nucleotide bases of the target sequences. The probe or thetarget sequences may be single- or double-stranded RNA or single- ordouble-stranded DNA or a combination of DNA and RNA bases. A probe is atleast 8 nucleotides in length and less than the length of a completegene. A probe may be 10, 20, 30, 50, 75, 100, 150, 200, 250, 400, 500and up to 2000 nucleotides in length as long as it is less than the fulllength of the target gene. In some embodiments, probes can be used astarget sequences bound on a microarray. In some embodiments, probes canbe used for quantitative real-time PCR (QRT-PCR) and includemodifications so as to incorporate a fluorophore, a quencher, a minorgroove binding reagent or other substances which allow detection of theprobe during PCR amplification. The probe can also be modified so as tohave both a detectable tag and a quencher molecule, for example Taqman®and Molecular Beacon® probes. The invention includes probes useful formeasuring the expression of RNA products of biomarkers of the invention.For example, Table 4, Table 6, Table 14, and Table 17 provide somerepresentative species of a probe of the invention useful for QRT-PCR.

The oligonucleotides and analogs thereof may be RNA or DNA, or analogsof RNA or DNA, commonly referred to as antisense oligomers or antisenseoligonucleotides. Such RNA or DNA analogs comprise but are not limitedto 2-′O-alkyl sugar modifications, methylphosphonate, phosphorothiate,phosphorodithioate, formacetal, 3′-thioformacetal, sulfone, sulfamate,and nitroxide backbone modifications, and analogs wherein the basemoieties have been modified. In addition, analogs of oligomers may bepolymers in which the sugar moiety has been modified or replaced byanother suitable moiety, resulting in polymers which include, but arenot limited to, morpholino analogs and peptide nucleic acid (PNA)analogs (Egholm et al. Peptide Nucleic Acids (PNA)—OligonucleotideAnalogues with an Achiral Peptide Backbone, (1992)).

Probes may also be mixtures of any of the oligonucleotide analog typestogether or in combination with native DNA or RNA and may also includelinker species. At the same time, the oligonucleotides and analogsthereof may be used alone or in combination with one or more additionaloligonucleotides or analogs thereof.

As used herein, the term “products of the biomarker” or “biomarkerproducts” refers to a species of RNA or a species of protein (wherein aspecies of RNA or protein can include multiple copies) isolated and/orderived from a sample including a tissue sample, a lymph sample, a lymphtissue sample, or a blood sample, or a fraction of a blood sample whichcorresponds to the biomarker (i.e., is transcribed from the gene orgenetic element or is translated from RNA which is transcribed from thegene or genetic element). See Table 3 and Table 13. The RNA may bepre-mRNA, mRNA, spliced variants of mRNA and the like. The protein maybe in its native state or post-translationally processed in any one ofvarious ways.

As used herein, “a plurality of” or “a set of” refers to two or more,for example, 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 ormore, 8 or more, 9 or more 10 or more etc.

As used herein, “pre-selected region”, “predefined region”, or “uniqueposition” refers to a localized area on a substrate which is, was, or isintended to be used for the deposit of a nucleic acid and is otherwisereferred to herein in the alternative as a “selected region” or simply a“region.” The pre-selected region may have any convenient shape, e.g.,circular, rectangular, elliptical, wedge-shaped, etc. In someembodiments, a pre-selected region is smaller than about 1 cm², morepreferably less than 1 mm², still more preferably less than 0.5 mm², andin some embodiments less than 0.1 mm². A nucleic acid member at a“pre-selected region”, “predefined region”, or “unique position” is onewhose identity (e.g., sequence) can be determined by virtue of itsposition at the region or unique position.

As used herein, the terms “prophylactic agent” and “prophylactic agents”refer to any compound(s) which can be used to prevent polyp formation,development, recurrence or onset. In certain embodiments, the term“prophylactic agent” refers to a compound identified in the screeningassays described herein. In certain other embodiments, the term“prophylactic agent” refers to an agent other than a compound identifiedin the screening assays described herein which is known to be usefulfor, or has been or is currently being used to prevent or impede theonset, development and/or progression or transformation of one or morecolorectal pathologies including one or more polyps or subtypes ofpolyps.

As used herein, the phrase “prophylactically effective amount” refers tothe amount of a therapy (e.g., a prophylactic agent) which is sufficientto result in the prevention of the development, recurrence or onset orprogression or transformation of one or more colorectal pathologiesincluding one or more polyps or subtypes of polyps; the reduction oramelioration of the progression and/or severity of one or morecolorectal pathologies including one or more polyps or subtypes ofpolyps; or the prevention of colorectal pathology including polyps orsubtypes of polyps advancing to colorectal cancer.

As used herein, the terms “protein” and “polypeptide” are usedinterchangeably to refer to a chain of amino acids linked together bypeptide bonds. In a specific embodiment, a protein is composed of lessthan 200, less than 175, less than 150, less than 125, less than 100,less than 50, less than 45, less than 40, less than 35, less than 30,less than 25, less than 20, less than 15, less than 10, or less than 5amino acids linked together by peptide bonds. In another embodiment, aprotein is composed of at least 200, at least 250, at least 300, atleast 350, at least 400, at least 450, at least 500 or more amino acidslinked together by peptide bonds.

A “protein coding region” refers to the portion of the mRNA encoding apolypeptide.

As used herein the “reference population” or “test population” refers toone or more populations of “control samples” used to develop one or moreclassifier. In one embodiment a single reference population can bedivided into subpopulations. In another embodiment, two or morereference populations can be used. In some instances a classifier can bedeveloped to differentiate between individuals with one or morecolorectal pathologies or one or more polyps or one or more subtypes ofpolyps and individuals without the same colorectal pathology or one ormore polyps or one or more subtypes of polyps. In some instances a firstreference population would be comprised of individuals with the one ormore colorectal pathologies and a second reference population would becomprised of individuals without the one or more colorectal pathologies.The “reference population” or “test population” can be comprised ofcontrol samples from a number of individuals diagnosed with one or morecolorectal pathologies including one or more polyps or one or moresubtypes of polyps and individuals not having the colorectal pathologiesor not having the one or more polyps or not having the one or moresubtypes of polyps as determined using conventional diagnostictechniques. Note that in some embodiments the population of individualshaving one or more colorectal pathologies can be selected to includeindividuals having a single subtype of polyp or one or more subtypes ofpolyps. In other embodiments the individuals who do not have one or morecolon pathologies can include individuals who have been diagnosed withother disease or diseases. In another embodiment the individuals who donot have one or more colon pathologies can include individuals who havebeen diagnosed with other cancers. In one embodiment the “referencepopulation” or “test population” is comprised of a roughly equivalentnumber of “control samples” from each trait subgroup (e.g., in thisinstance wherein said trait is a determination of status with regards tothe presence of colorectal pathology). In another embodiment, each traitsubgroup (e.g., having or not having colorectal pathology) of the“reference population” has a similar distribution with regards to othertraits e.g., age, sex, drug status, etc.

As used herein, the term “selectively binds” in the context of proteinsencompassed by the invention refers to the specific interaction of anytwo of a peptide, a protein, a polypeptide an antibody, wherein theinteraction preferentially occurs as between any two of a peptide,protein, polypeptide and antibody preferentially as compared with anyother peptide, protein, polypeptide and antibody. For example, when thetwo molecules are protein molecules, a structure on the first moleculerecognizes and binds to a structure on the second molecule, rather thanto other proteins. “Selective binding”, “Selective binding”, as the termis used herein, means that a molecule binds its specific binding partnerwith at least 2-fold greater affinity, and preferably at least 10-fold,20-fold, 50-fold, 100-fold or higher affinity than it binds anon-specific molecule.

As used herein “selective hybridization” can refer to a hybridizationwhich occurs as between a polynucleotide and an RNA or protein productof the biomarker of the invention wherein the hybridization is such thatthe polynucleotide binds to the RNA products of the biomarker of theinvention preferentially to the RNA products of other genes in thegenome in question. In a preferred embodiment a polynucleotide which“selectively hybridizes” is one which hybridizes with a selectivity ofgreater than 70%, greater than 80%, greater than 90% and most preferablyon 100% (ie cross hybridization with other RNA species preferably occursat less than 30%, less than 20%, less than 10%). As would be understoodto a person skilled in the art, a polynucleotide which “selectivelyhybridizes” to the RNA products of a biomarker of the invention can bedetermined by taking into account the length and composition.

As used herein, “specifically hybridizes”, “specific hybridization” canrefer to hybridization which occurs when two nucleic acid sequences aresubstantially complementary (at least about 65% complementary over astretch of at least 14 to 25 nucleotides, preferably at least about 75%complementary, more preferably at least about 90% complementary). SeeKanehisa, M., 1984, Nucleic acids Res., 12:203, incorporated herein byreference. As a result, it is expected that a certain degree of mismatchis tolerated. Such mismatch may be small, such as a mono-, di- ortri-nucleotide. Alternatively, a region of mismatch can encompass loops,which are defined as regions in which there exists a mismatch in anuninterrupted series of four or more nucleotides. Numerous factorsinfluence the efficiency and selectivity of hybridization of two nucleicacids, for example, the hybridization of a nucleic acid member on anarray to a target nucleic acid sequence. These factors include nucleicacid member length, nucleotide sequence and/or composition,hybridization temperature, buffer composition and potential for sterichindrance in the region to which the nucleic acid member is required tohybridize. A positive correlation exists between the nucleic acid lengthand both the efficiency and accuracy with which a nucleic acid willanneal to a target sequence. In particular, longer sequences have ahigher melting temperature (T_(M)) than do shorter ones, and are lesslikely to be repeated within a given target sequence, thereby minimizingnon-specific hybridization. Hybridization temperature varies inverselywith nucleic acid member annealing efficiency. Similarly theconcentration of organic solvents, e.g., formamide, in a hybridizationmixture varies inversely with annealing efficiency, while increases insalt concentration in the hybridization mixture facilitate annealing.Under stringent annealing conditions, longer nucleic acids, hybridizemore efficiently than do shorter ones, which are sufficient under morepermissive conditions.

As used herein, “spotting” or “attaching” refers to a process ofdepositing a nucleic acid member onto a solid substrate to form anucleic acid array such that the nucleic acid is stably bound to thesolid substrate via covalent bonds, hydrogen bonds or ionicinteractions.

As used herein, “stably associated” refers to a nucleic acid that isstably bound to a solid substrate to form an array via covalent bonds,hydrogen bonds or ionic interactions such that the nucleic acid retainsits unique pre-selected position relative to all other nucleic acidsthat are stably associated with an array, or to all other pre-selectedregions on the solid substrate under conditions in which an array istypically analyzed (i.e., during one or more steps of hybridization,washes, and/or scanning, etc.).

As used herein, “substrate” or “support” when referring to an arrayrefers to a material capable of supporting or localizing anoligonucleotide or cDNA member. The support may be biological,non-biological, organic, inorganic, or a combination of any of these,existing as particles, strands, precipitates, gels, sheets, tubing,spheres, beads, containers, capillaries, pads, slices, films, plates,slides, chips, etc. Often, the substrate is a silicon or glass surface,(poly)tetrafluoroethylene, (poly)vinylidendifluoride, polystyrene,polycarbonate, a charged membrane, such as nylon 66 or nitrocellulose,or combinations thereof. In one embodiment, the support is glass. Insome embodiments, at least one surface of the substrate will besubstantially flat. In some embodiments, the support will containreactive groups, including, but not limited to, carboxyl, amino,hydroxyl, thiol, and the like. In one embodiment, the support isoptically transparent.

As herein used, the term “standard stringent conditions” meanshybridization will occur only if there is at least 95% and preferably,at least 97% identity between the sequences, wherein the region ofidentity comprises at least 10 nucleotides. In one embodiment, thesequences hybridize under stringent conditions following incubation ofthe sequences overnight at 42° C., followed by stringent washes (0.2×SSCat 65° C.). The degree of stringency of washing can be varied bychanging the temperature, pH, ionic strength, divalent cationconcentration, volume and duration of the washing. For example, thestringency of hybridization may be varied by conducting thehybridization at varying temperatures below the melting temperatures ofthe probes. The melting temperature of the probe may be calculated usingthe following formulas:

For oligonucleotide probes, between 14 and 70 nucleotides in length, themelting temperature (Tm) in degrees Celcius may be calculated using theformula: Tm=81.5+16.6 (log [Na+])+0.41 (fraction G+C)−(600/N) where N isthe length of the oligonucleotide.

For example, the hybridization temperature may be decreased inincrements of 5° C. from 68° C. to 42° C. in a hybridization bufferhaving a Na⁺ concentration of approximately 1M. Following hybridization,the filter may be washed with 2×SSC, 0.5% SDS at the temperature ofhybridization. These conditions are considered to be “moderatestringency” conditions above 50° C. and “low stringency” conditionsbelow 50° C. A specific example of “moderate stringency” hybridizationconditions is when the above hybridization is conducted at 55° C. Aspecific example of “low stringency” hybridization conditions is whenthe above hybridization is conducted at 45° C.

If the hybridization is carried out in a solution containing formamide,the melting temperature may be calculated using the equationTm=81.5+16.6(log [Na+])+0.41(fraction G+C)−(0.63% formamide)−(600/N),where N is the length of the probe.

For example, the hybridization may be carried out in buffers, such as6×SSC, containing formamide at a temperature of 42° C. In this case, theconcentration of formamide in the hybridization buffer may be reduced in5% increments from 50% to 0% to identify clones having decreasing levelsof homology to the probe. Following hybridization, the filter may bewashed with 6×SSC, 0.5% SDS at 50° C. These conditions are considered tobe “moderate stringency” conditions above 25% formamide and “lowstringency” conditions below 25% formamide. A specific example of“moderate stringency” hybridization conditions is when the abovehybridization is conducted at 30% formamide. A specific example of “lowstringency” hybridization conditions is when the above hybridization isconducted at 10% formamide.

As used herein, the term “significant match”, when referring to nucleicacid sequences, means that two nucleic acid sequences exhibit at least65% identity, at least 70%, at least 75%, at least 80%, at least 85%,and preferably, at least 90% identity, using comparison methods wellknown in the art (i.e., Altschul, S. F. et al., 1997, Nucl. Acids Res.,25:3389-3402; Schäffer, A. A. et al., 1999, Bioinformatics15:1000-1011). As used herein, “significant match” encompassesnon-contiguous or scattered identical nucleotides so long as thesequences exhibit at least 65%, and preferably, at least 70%, at least75%, at least 80%, at least 85%, and preferably, at least 90% identity,when maximally aligned using alignment methods routine in the art.

As used herein, the term “synergistic” refers to a combination of acompound identified using one of the methods described herein, andanother therapy (e.g., agent), which is more effective than the additiveeffects of the therapies. In some embodiments, such other therapy hasbeen or is currently being to prevent, treat, or ameliorate one or morecolorectal pathologies including one or more polyps or one or moresubtypes of polyps. A synergistic effect of a combination of therapies(e.g., prophylactic or therapeutic agents) permits the use of lowerdosages of one or more of the therapies and/or less frequentadministration of said therapies to an individual with colorectalpathologies including polyps or subtypes of polyps. The ability toutilize lower dosages of a therapy (e.g., a prophylactic or therapeuticagent) and/or to administer said therapy less frequently reduces thetoxicity associated with the administration of said agent to anindividual without reducing the efficacy of said therapies in theprevention or treatment of colorectal pathology including polyps orsubtypes of polyps. In addition, a synergistic effect can result inimproved efficacy of therapies (e.g., agents) in the prevention ortreatment of colorectal pathology including polyps or subtypes ofpolyps. Finally, a synergistic effect of a combination of therapies(e.g., prophylactic or therapeutic agents) may avoid or reduce adverseor unwanted side effects associated with the use of either therapyalone.

As used herein, a “therapeutic agent” or “agent” refers to a compoundthat increases or decreases the expression of a polynucleotide orpolypeptide sequences that are differentially expressed in a sample froman individual having one or more colorectal pathologies including polypsor a subtype of polyps. The invention provides for a “therapeutic agent”that 1) prevents the formation of colorectal pathology 2) reduces,delays, or eliminates advancement or transformation of colorectalpathology and/or 3) restores one or more expression profiles of one ormore colorectal pathology indicative nucleic acids or polypeptides of apatient to a profile more similar to that of a normal individual whenadministered to a patient. In addition, the terms “therapeutic agent”and “therapeutic agents” refer to any compound(s) which can be used inthe treatment or prevention of colorectal pathology or polyps or asubtype of polyps. In certain embodiments, the term “therapeutic agent”refers to a compound identified in the screening assays describedherein. In other embodiments, the term “therapeutic agent” refers to anagent other than a compound identified in the screening assays describedherein which is known to be useful for, or has been or is currentlybeing used to treat or prevent colorectal pathology or polyps orsubtypes of polyps.

As used herein, the term “therapeutically effective amount” refers tothat amount of a therapy (e.g., a therapeutic agent) sufficient to treatone or more colorectal pathologies including polyps or one or moresubtypes of polyps; prevent one or more colorectal pathologies includingpolyps or one or more subtypes of polyps; prevent colorectal pathologiesincluding polyps or one or more subtypes of polyps from transformingand/or advancing to colorectal cancer, cause regression of colorectalpathology, polyps or one or more subtypes of polyps, or to enhance orimprove the therapeutic effect(s) of another therapy (e.g., therapeuticagent). In a specific embodiment, a therapeutically effective amountrefers to the amount of a therapy (e.g., a therapeutic agent) thatmodulates gene expression of the products of the biomarkers of theinventions. In some embodiments, a therapeutically effective amount of atherapy (e.g., a therapeutic agent) modulates gene expression of theproducts of the biomarkers of the invention at least 5%, preferably atleast 10%, at least 15%, at least 20%, at least 25%, at least 30%, atleast 35%, at least 40%, at least 45%, at least 50%, at least 55%, atleast 60%, at least 65%, at least 70%, at least 75%, at least 80%, atleast 85%, at least 90%, at least 95%, or at least 100% relative to acontrol therapeutic agent such as phosphate buffered saline (“PBS”).

As used herein, the terms “treat”, “treatment” and “treating” refer tothe prevention of one or more colorectal pathologies including polypformation or the formation of one or more subtypes of polyps,development, recurrence onset or transformation of one or morecolorectal pathologies and, the reduction or amelioration of theprogression and/or severity of one or more colorectal pathologiesincluding polyps or subtypes thereof resulting from the administrationof one or more compounds identified in accordance the methods of theinvention, or a combination of one or more compounds identified inaccordance with the invention and another therapy.

As used herein, a “tissue nucleic acid sample”, refers to nucleic acidsisolated and/or derived from tissue, for example polyp tissue, colontissue, rectum tissue, lymphoid tissue, and the like. In someembodiments, a tissue nucleic acid sample is total RNA, mRNA or is anucleic acid corresponding to RNA, for example, cDNA. A tissue nucleicacid sample can also include a PCR product derived from total RNA, mRNAor cDNA.

(C) Samples for Use in the Invention

Samples for use in the invention include refers to any one of varioustype of molecules, cells and/or tissues which can be isolated and/orderived from a test subject and/or control subject, and which containsone or more biomarker products. The sample can be isolated and/orderived from any fluid, cell or tissue. The sample can also be one whichis isolated and/or derived from any fluid and/or tissue whichpredominantly comprises blood cells.

The sample which is isolated and/or derived from an individual can beassayed for gene expression products, particularly genes expressionproducts differentially expressed in individuals with or without one ormore colorectal pathologies. In one embodiment, the sample is a fluidsample, a lymph sample, a lymph tissue sample or a blood sample. In oneembodiment the sample is isolated and/or derived from peripheral blood.Alternately, the sample may be isolated and/or derived from alternatesources, including from any one of various types of lymphoid tissue.

Examples of samples isolated and/or derived from blood include samplesof whole blood, serum-reduced whole blood, serum-depleted blood, andserum-depleted and erythrocyte depleted blood.

Unless otherwise indicated herein, samples obtained from any individualmay be used in accordance with the methods of the invention. Examples ofindividuals from which such a sample may be obtained and utilized inaccordance with the methods of the invention include, but are notlimited to, individuals suspected of having one or more colorectalpathologies, individuals diagnosed as having one or more colorectalpathologies; individuals that have not been diagnosed with having one ormore colorectal pathologies; individuals who have been confirmed as nothaving one or more colorectal pathologies.

In a further embodiment, the individual from whom a sample may beobtained is a test subject wherein it is unknown whether the person hasone or more colorectal pathologies or not. In another embodiment, theindividual from whom a sample may be obtained is a test subject whereinit is unknown whether the person has one or more colorectal pathologiesor not.

Blood

In one aspect of the invention, a sample of blood is obtained from anindividual according to methods well known in the art. A sample of bloodmay be obtained from an individual, for example a subject having one ormore colorectal pathologies, suspected of having one or more colorectalpathologies or not having one or more colorectal pathologies. In someembodiments, a drop of blood is collected from a simple pin prick madein the skin of an individual. Blood may be drawn from an individual fromany part of the body (e.g., a finger, a hand, a wrist, an arm, a leg, afoot, an ankle, a stomach, and a neck) using techniques known to one ofskill in the art, in particular methods of phlebotomy known in the art.

The amount of blood collected will vary depending upon the site ofcollection, the amount required for a method of the invention, and thecomfort of the individual. However, an advantage of one embodiment ofthe present invention is that the amount of blood required to implementthe methods of the present invention can be so small that more invasiveprocedures are not required to obtain the sample. For example, in someembodiments, all that is required is a drop of blood. This drop of bloodcan be obtained, for example, from a simple pinprick. In someembodiments, any amount of blood is collected that is sufficient todetect the expression of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18 or all of the genes in Table 1, Table 2, Table 11, andTable 12. As such, in some embodiments, the amount of blood that iscollected is 1 ml or less, 0.5 ml or less, 0.1 ml or less, or 0.01 ml orless. However, the present invention is not limited to such embodiments.In some embodiments more blood is available and in some embodiments,more blood can be used to effect the methods of the present invention.As such, in various specific embodiments, 0.001 ml, 0.005 ml, 0.01 ml,0.05 ml, 0.1 ml, 0.15 ml, 0.2 ml, 0.25 ml, 0.5 ml, 0.75 ml, 1 ml, 1.5ml, 2 ml, 3 ml, 4 ml, 5 ml, 10 ml, 15 ml or more of blood is collectedfrom a subject. In another embodiment, 0.001 ml to 15 ml, 0.01 ml to 10ml, 0.1 ml to 10 ml, 0.1 ml to 5 ml, 1 to 5 ml of blood is collectedfrom an individual. In a further embodiment, 0.001-100 ml, preferably0.01-50 ml, more preferably 0.01-25 ml and most preferably 0.01-1 ml ofblood is collected from an individual.

In some embodiments of the present invention, blood is stored within aK3/EDTA tube (e.g., from Becton Dickinson). In another embodiment, onecan utilize tubes for storing blood which contain stabilizing agentssuch as disclosed in U.S. Pat. No. 6,617,170 (which is incorporatedherein by reference). In another embodiment the PAXgene™ blood RNAsystem: provided by PreAnalytiX, a Qiagen/BD company, may be used tocollect blood. In yet another embodiment, the Tempus™ blood RNAcollection tubes, offered by Applied Biosystems may be used. Tempus™collection tubes provide a closed evacuated plastic tube containing RNAstabilizing reagent for whole blood collection.

The blood collected is in some embodiments utilized immediately orwithin 1 hour, 2 hours, 3 hours, 4 hours, 5 hours or 6 hours or isoptionally stored at temperatures such as 4° C., or at −20° C. prior touse in accordance with the methods of the invention. In someembodiments, a portion of the blood sample is used in accordance withthe invention at a first instance of time whereas one or more remainingportions of the blood sample (or fractions thereof) are stored for aperiod of time for later use. For longer-term storage, storage methodswell known in the art, such as storage at cryo temperatures (e.g., below−60° C.) can be used. In some embodiments, in addition to storage of theblood or instead of storage of the blood, plasma, serum, isolatednucleic acid or proteins are stored for a period of time for later usein accordance with methods known in the art.

In one aspect, whole blood is obtained from an individual according tothe methods of phlebotomy well known in the art. Whole blood includesblood which can be used as is, and includes blood wherein the serum orplasma has been removed or reduced, and the RNA or mRNA from theremaining blood sample has been isolated in accordance with methods wellknown in the art (e.g., using, in some embodiments, gentlecentrifugation at 300 to 800×g for 5 to 10 minutes). In a specificembodiment, whole blood (i.e., unfractionated blood) obtained from asubject is mixed with lysing buffer (e.g., Lysis Buffer (1 L): 0.6 gEDTA; 1.0 g KHCO₂, 8.2 g NH₄Cl adjusted to pH 7.4 (using NaOH)), thesample is centrifuged and the cell pellet retained, and RNA or mRNAextracted in accordance with methods known in the art (“lysed blood”)(see for example Sambrook et al.). In one embodiment, it is helpful touse unfractionated whole blood is preferred since it avoids the costlyand time-consuming process to separate out the cell types within theblood (Kimoto, 1998, Mol. Gen. Genet 258:233-239; Chelly J et al., 1989,Proc. Nat. Acad. Sci. USA 86:2617-2621; Chelly J et al., 1988, Nature333:858-860).

In some embodiments of the present invention, whole blood collected froman individual is fractionated (i.e., separated into components) beforeisolated products of the biomarkers from the sample. In one embodiment,blood is serum depleted (or serum reduced). In another embodiment theblood is plasma depleted (or plasma reduced). In yet other embodimentsblood is erythrocyte depleted or reduced. In some embodimentserythrocyte reduction is performed by preferentially lysing the redblood cells. In other embodiments, erythrocyte depletion or reduction isperformed by lysing the red blood cells and further fractionating theremaining cells. In yet other embodiments erythrocyte depletion orreduction is performed but the remaining cells are not furtherfractionated. In other embodiments blood cells are separated from wholeblood collected from an individual using other techniques known in theart. For example, blood collected from an individual can be subjected toFicoll-Hypaque (Pharmacia) gradient centrifugation. Such centrifugationmay separate various types of cells from a blood sample. In particular,Ficoll-Hypaque gradient centrifugation is useful to isolate peripheralblood leukocytes (PBLs) which can be used in accordance with the methodsof the invention.

By way of example but not limitation, macrophages can be obtained asfollows. Mononuclear cells are isolated from peripheral blood of asubject, by syringe removal of blood followed by Ficoll-Hypaque gradientcentrifugation. Tissue culture dishes are pre-coated with the subject'sown serum or with AB+ human serum and incubated at 37° C. for one hour.Non-adherent cells are removed by pipetting. Cold (4° C.) 1 mM EDTA inphosphate-buffered saline is added to the adherent cells left in thedish and the dishes are left at room temperature for fifteen minutes.The cells are harvested, washed with RPMI buffer and suspended in RPMIbuffer. Increased numbers of macrophages can be obtained by incubatingat 37° C. with macrophage-colony stimulating factor (M-CSF). Antibodiesagainst macrophage specific surface markers, such as Mac-1, can belabeled by conjugation of an affinity compound to such molecules tofacilitate detection and separation of macrophages. Affinity compoundsthat can be used include but are not limited to biotin, photobiotin,fluorescein isothiocyante (FITC), or phycoerythrin (PE), or othercompounds known in the art. Cells retaining labeled antibodies are thenseparated from cells that do not bind such antibodies by techniquesknown in the art such as, but not limited to, various cell sortingmethods, affinity chromatography, and panning

Blood cells can be sorted using a using a fluorescence activated cellsorter (FACS). Fluorescence activated cell sorting (FACS) is a knownmethod for separating particles, including cells, based on thefluorescent properties of the particles. See, for example, Kamarch,1987, Methods Enzymol 151:150-165. Laser excitation of fluorescentmoieties in the individual particles results in a small electricalcharge allowing electromagnetic separation of positive and negativeparticles from a mixture. An antibody or ligand used to detect a bloodcell antigenic determinant present on the cell surface of particularblood cells is labeled with a fluorochrome, such as FITC orphycoerythrin. The cells are incubated with the fluorescently labeledantibody or ligand for a time period sufficient to allow the labeledantibody or ligand to bind to cells. The cells are processed through thecell sorter, allowing separation of the cells of interest from othercells. FACS sorted particles can be directly deposited into individualwells of microtiter plates to facilitate separation.

Magnetic beads can be also used to separate blood cells in someembodiments of the present invention. For example, blood cells can besorted using a using a magnetic activated cell sorting (MACS) technique,a method for separating particles based on their ability to bindmagnetic beads (0.5-100 m diameter). A variety of useful modificationscan be performed on the magnetic microspheres, including covalentaddition of an antibody which specifically recognizes a cell-solid phasesurface molecule or hapten. A magnetic field is then applied, tophysically manipulate the selected beads. In a specific embodiment,antibodies to a blood cell surface marker are coupled to magnetic beads.The beads are then mixed with the blood cell culture to allow binding.Cells are then passed through a magnetic field to separate out cellshaving the blood cell surface markers of interest. These cells can thenbe isolated.

In some embodiments, the surface of a culture dish may be coated withantibodies, and used to separate blood cells by a method called panningSeparate dishes can be coated with antibody specific to particular bloodcells. Cells can be added first to a dish coated with blood cellspecific antibodies of interest. After thorough rinsing, the cells leftbound to the dish will be cells that express the blood cell markers ofinterest. Examples of cell surface antigenic determinants or markersinclude, but are not limited to, CD2 for T lymphocytes and naturalkiller cells, CD3 for T lymphocytes, CD11a for leukocytes, CD28 for Tlymphocytes, CD19 for B lymphocytes, CD20 for B lymphocytes, CD21 for Blymphocytes, CD22 for B lymphocytes, CD23 for B lymphocytes, CD29 forleukocytes, CD14 for monocytes, CD41 for platelets, CD61 for platelets,CD66 for granulocytes, CD67 for granulocytes and CD68 for monocytes andmacrophages.

Whole blood can be separated into cell types such as leukocytes,platelets, erythrocytes, etc. and such cell types can be used inaccordance with the methods of the invention. Leukocytes can be furtherseparated into granulocytes and agranulocytes using standard techniquesand such cells can be used in accordance with the methods of theinvention. Granulocytes can be separated into cell types such asneutrophils, eosinophils, and basophils using standard techniques andsuch cells can be used in accordance with the methods of the invention.Agranulocytes can be separated into lymphocytes (e.g., T lymphocytes andB lymphocytes) and monocytes using standard techniques and such cellscan be used in accordance with the methods of the invention. Tlymphocytes can be separated from B lymphocytes and helper T cellsseparated from cytotoxic T cells using standard techniques and suchcells can be used in accordance with the methods of the invention.Separated blood cells (e.g., leukocytes) can be frozen by standardtechniques prior to use in the present methods.

(D) RNA Preparation

In one aspect of the invention, RNA is isolated from an individual inorder to measure the RNA products of the biomarkers of the invention.RNA is isolated from a sample from individuals diagnosed as having oneor more colorectal pathologies including one or more polyps or one ormore subtype of polyps, individuals not having one or more colorectalpathologies, not having one or more polyps or not having a subtype ofpolyps, or test subjects.

In some embodiments, RNA is isolated from blood which is erythrocytedepleted by the following protocol. Lysis Buffer is added to bloodsample in a ratio of 3 parts Lysis Buffer to 1 part blood (Lysis Buffer(1 L) 0.6 g EDTA; 1.0 g KHCO₂, 8.2 g NH₄Cl adjusted to pH 7.4 (usingNaOH)). Sample is mixed and placed on ice for 5-10 minutes untiltransparent. Lysed sample is centrifuged at 1000 rpm for 10 minutes at4° C., and supernatant is aspirated. Pellet is resuspended in 5 ml LysisBuffer, and centrifuged again at 1000 rpm for 10 minutes at 4° C.Pelleted cells are homogenized using TRIzol® (GIBCO/BRL) in a ratio ofapproximately 6 ml of TRIzol® for every 10 ml of the original bloodsample and vortexed well. Samples are left for 5 minutes at roomtemperature. RNA is extracted using 1.2 ml of chloroform per 1 ml ofTRIzol®. Sample is centrifuged at 12,000×g for 5 minutes at 4° C. andupper layer is collected. To upper layer, isopropanol is added in ratioof 0.5 ml per 1 ml of TRIzol®. Sample is left overnight at −20° C. orfor one hour at −20° C. RNA is pelleted in accordance with knownmethods, RNA pellet air dried, and pellet resuspended in DEPC treatedddH₂O. RNA samples can also be stored in 75% ethanol where the samplesare stable at room temperature for transportation.

In other aspects, RNA is prepared by first collecting blood into aPAXgene™ collection tube and then subsequently isolating the RNA usingthe PAXgene™ blood RNA isolated system provided by PreAnalytiX, aQiagen/BD company. In another embodiment, RNA is prepared by firstcollecting blood into any known stabilizing solution (e.g., a PAXgene™collection or a TEMPUS® collection tube and then isolating the RNA usingany method known to a person skilled in the art.

In other aspects globin reduced or depleted RNA is prepared. In oneembodiment RNA is isolated first and then is subsequently treated toremove globin mRNA using one of any technique known in the art. Forexample, one can hybridize DNA primers and/or probes specific for globinRNA and utilize RNAse H to selectively degrade globin mRNA. In otherembodiments RNA is isolated in a manner which removes the globin RNAduring the RNA isolation steps (for example reducing globin RNA byselectively removing globin RNA using globin primers and/or probesattached to paramagnetic particles).

In other aspects of the invention RNA is prepared using one or moreknown commercial kits for isolating RNA (including isolating total RNAor mRNA and the like) such as oligo dT based purification methods,Qiagen® RNA isolation methods, LeukoLOCKT Total RNA Isolation System,MagMAX-96 Blood Technology from Ambion, Promega® polyA mRNA isolationsystem and the like.

Purity and integrity of RNA can be assessed by absorbance at 260/280 nmand agarose gel electrophoresis followed by inspection under ultravioletlight. In some embodiments RNA integrity is assessed using moresensitive techniques such as the Agilent 2100 Bioanalyzer 6000 RNA NanoChip.

(E) Biomarkers of the Invention

In one aspect, the invention provides biomarkers and biomarkercombinations wherein the measure of the level of expression of theproduct or products of said biomarkers is indicative of the presence ofone or more colorectal pathologies.

Table 1 is a list of biomarkers of one aspect of the invention. Eachbiomarker is differentially expressed in samples from individuals havingor not having polyps using microarray assays. The table provides theHugo Gene name, symbol and locus link ID; the RNA and protein accessionnumber; and also includes both the p value (which represents thestatistical significance of the observed differential expression) and ameasure of the fold change as between the average measured level ofindividuals having polyps and the average measured level of individualsnot having polyps.

Table 2 is a selection of those genes listed in Table 1 and lists thegene symbol and the associated locus link ID for the biomarkers of theinvention. The table also provides the fold change and direction ofdifferential gene expression in individuals having polyps as compared toindividuals not having polyps. As described, differential expression ofthe genes between individuals having or not having polyps can beidentified using a non-parametic Wilcoxan-Mann-Whitney test or aparametric t test. The results of the tests are also shown in Table 2.

Table 11 shows genes identified as differentially expressed in samplesfrom individuals having “high risk polyps” as compared with individualsnot having high risk polyps (ie having low risk polyps or having nopathology at all) using microarray as described in Example 2. The tableprovides the gene name, gene ID; a representative human RNA accessionnumber, and also provides the p value, the fold change (as between theaverage of individuals classified as having high risk polyps as comparedwith the average of individuals having low risk polyps), along with thecoefficient of variation for both the high risk polyp individuals andthe low risk polyp individuals (the standard deviation of the normalizedintensity divided by the mean normalized intensity). Column 1 isAffySpotID, column 2 is Fold Change, column 3 is p value, Column 4 is CV(Coefficient of Variation) (High RiskPolyp), column 5 is CV (Low RiskPolyp)., column 6 is Gene ID, column 7 is the HUGO Gene Symbol, column 8is the Human RNA Accession Number and column 9 is the Gene Description.

Table 12 shows 48 biomarkers tested for differentially expression byQRT-PCR in samples from individuals having colorectal cancer andindividuals not having colorectal cancer. The 48 biomarkers were testedusing QRT-PCR. The table provides the gene symbol, locus link ID, andgene description for each biomarker. The table also includes the p value(which represents the statistical significance of the observeddifferential expression), the measure of the fold change as between theaverage measured level of individuals having colorectal cancer and theaverage measured level of individuals not having colorectal cancer andthe direction of the differential expression between individuals havingcolorectal cancer and not having colorectal cancer.

Other biomarkers of the invention are described within thespecification. The invention thus encompasses the use of those methodsknown to a person skilled in the art to measure the expression of thesebiomarkers and combinations of biomarkers for each of the purposesoutlined above.

As would be understood by a person skilled in the art, the locus link IDcan be used to determine the sequence of all the RNA products and allthe protein products of the biomarkers of the invention.

(F) Combinations of Biomarkers

In one embodiment, combinations of biomarkers of the present inventioninclude any combination of the biomarkers listed in Table 1, Table 2,Table 11, or Table 12. For instance, the number of possible combinationsof a subset n of m genes in any of the tables above is described inFeller, Intro to Probability Theory, Third Edition, volume 1, 1968, ed.J. Wiley, using the general formula:

m!/(n)!(m−n)!

For example, where n is 2 and m is 8, the number of combinations ofbiomarkers is:

$\begin{matrix}{\frac{8!}{{2!}\mspace{14mu} {\left( {8 - 2} \right)!}} = \frac{8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1}{\left( {2 \times 1} \right)\mspace{14mu} \left( {6 \times 5 \times 4 \times 3 \times 2 \times 1} \right)}} \\{= {40320/1440}} \\{= 28}\end{matrix}$

unique two-gene combinations. The measurement of the gene expression ofeach of these two-gene combinations can independently be used todetermine whether a patient has one or more colorectal pathologies. Inanother specific embodiment in which m is 8 and n is three, there are8!/3!(8−3)! unique three-gene combinations. Each of these uniquethree-gene combinations can independently serve as a model fordetermining whether a patient has one or more colorectal pathologies.

(G) Testing Combinations of Biomarkers by Generating Formulas Resultingfrom One or More Classifiers

The invention further provides a means of testing combinations ofbiomarkers from Table 1, Table 2, Table 11, or Table 12 or subsetsthereof for their ability to test for one or more colorectal pathologiesor one or more subtypes of colorectal pathology. Also provided aremethods of evaluating the combinations tested tested for their abilityto test an individual for the presence of one or more colorectalpathologies or one or more subtypes of colorectal pathology. In order totest combinations of biomarkers and generate classifiers, a mathematicalmodel of the invention can be used. A mathematical model of theinvention can be used to test each selected combination of biomarkersfrom all combinations of biomarkers or a selected subset thereof.

In some embodiments, it is useful to further select biomarkers to betested as combinations. In one embodiment, one can select individualbiomarkers on the basis of the p value as a measure of the likelihoodthat the the individual biomarker can distinguish as between the twophenotypic trait subgroups. Thus in one embodiment, biomarkers arechosen to test in combination by input into a model wherein the p valueof each biomarker is less than 0.2, 0.1, 0.5; less than 0.1, less than0.05, less than 0.01, less than 0.005, less than 0.001, less than0.0005, less than 0.0001, less than 0.00005, less than 0.00001, lessthan 0.000005, less than 0.000001 etc. We have also surprisingly foundthat even biomarkers which demonstrate a p value of greater than 0.2(and thus would normally not be considered to be a useful individualbiomarker) do significantly increase the ability of a combination ofbiomarkers in which they are included to distinguish as between twophenotypic trait subgroups. In other embodiments, biomarkers for inputinto the model to test in combination are chosen on the basis of thefold change of differential expression of the product of the biomarkeras between the two phenotypic trait subgroups. Note that in measuringdifferential fold change in blood, the fold change differences can bequite small, thus in some embodiments, selection of biomarker subsetsfor input into classifier is based on a differential fold change wherethe fold change is greater than 1.01, 1.02, 1.03, 1.04, 1.05, 1.06,1.07, 1.08, 1.09, 1.1, 1.125, 1.15, 1.175, 1, 1.2, 1.225, 1.25, 1.275,1.30, greater than 1.3, greater than 1.4, greater than 1.5, greater than1.6, greater than 1.7, greater than 1.8, greater than 1.9, greater than2.0, greater than 2.1, greater than 2.2, greater than 2.3, greater than2.4, greater than 2.5, greater than 2.6, greater than 2.7, greater than2.8, greater than 2.9, greater than 3.0, greater than 3.1, greater than3.2, greater than 3.3, greater than 3.4, greater than 3.5, greater than4.0, and the like. In yet other embodiments in order to select subsetsof biomarkers to test in combination, one can also take into account thecoefficient of variation as a variability of the data representing thelevel of expression of the product of the biomarker amongst individualswithin a phenotypic trait subgroup. In some embodiments, it is helpfulto select biomarkers on the basis of a combination of factors includingp value, fold change, and coefficient of variation as would beunderstood by a person skilled in the art. In some embodiments,biomarkers are first selected as outlined above on the basis of the pvalue resulting from the biomarker data and then a subselection of saidbiomarkers is chosen on the basis of the differential fold changedetermined from the biomarker data. In other embodiments, biomarkers arefirst selected on the basis of differential fold change, and thensubselection is made on the basis of p value. In some embodiments, theuse of one or more of the selection criteria and subsequent rankingpermits the selection of the top 2.5%, 5%, 7.5%, 10%, 12.5%, 15%, 17.5%,20%, 30%, 40%, 50% or more of the ranked biomarkers for input into themodel. In some embodiments, the desired number of selected biomarkerscan be 4,000; 3,000; 2,000; 1,000; 900; 800; 700; 600; 500; 400; 300;200; 190; 180; 170; 160; 150; 140; 130; 120; 110; 100; 90; 80; 70; 60;50; 40; 30; 20; or 10. In other embodiments, the selection criterianoted above can be set on the basis of the desired number of selectedbiomarkers for use in the model. As would be understood, one can selecttherefore all of the individually identified biomarkers or subsets ofthe individually identified biomarkers and test all possiblecombinations of the selected biomarkers to identify useful combinationsof biomarkers. In another embodiment, one can select a subset ofbiomarkers and then test all possible combinations of 2 biomarkers fromthat subset, 3 biomarkers from that subset, 4 biomarkers from thatsubset, 5 biomarkers from that subset, 6 biomarkers from that subset 7biomarkers from that subset, 8 biomarkers from that subset 9 biomarkersfrom that subset or 10 biomarkers from that subset in order to identifyuseful combinations of biomarkers. A selection criteria to determine thenumber of selected individual biomarkers to test in combination, and toselect the number of possible combinations of biomarkers will dependupon the resources available for obtaining the biomarker data and/or thecomputer resources available for calculating and evaluating classifiersresulting from the model.

The classifier generated by the mathematical model can be subsequentlyevaluated by determining the ability of the classifier to correctly calleach individual for one of the two phenotypic traits of the populationused to generate the classifier (ie having or not having one or morecolorectal pathologies). In a preferred embodiment, the individuals ofthe training population used to derive the model are different from theindividuals of the training population used to test the model. As wouldbe understood by a person skilled in the art, this allows one to predictthe ability of the combinations as to their ability to properlycharacterize an individual whose phenotypic trait characterization isunknown.

The data which is input into the mathematical model can be any datawhich is representative of the expression level of the product of thebiomarkers being evaluated. Mathematical models useful in accordancewith the invention include those using both supervised or unsupervisedlearning techniques. In a preferred embodiment of the invention, themathematical model chosen uses supervised learning in conjunction with a“training population” to evaluate each of the possible combination ofbiomarkers of the invention. In one embodiment of the invention, themathematical model used is selected from the following: a regressionmodel, a logistic regression model, a neural network, a clusteringmodel, principal component analysis, nearest neighbour classifieranalysis, linear discriminant analysis, quadratic discriminant analysis,a support vector machine, a decision tree, a genetic algorithm,classifier optimization using bagging, classifier optimization usingboosting, classifier optimization using the Random Subspace Method, aprojection pursuit, genetic programming and weighted voting. In apreferred embodiment, a logistic regression model is used. In anotherpreferred embodiment, a neural network model is used.

The results of applying a mathematical model of the invention to thedata will generate one or more classifiers using one or more biomarkers.In some embodiments, multiple classifiers are created which aresatisfactory for the given purpose (e.g., all have sufficient AUC and/orsensitivity and/or specificity). In this instance, in some embodiments,a formula is generated which utilizes more than one classifier. Forexample, a formula can be generated which utilizes classifiers in series(e.g., first obtains results of classifier A, then classifier B e.g.Classifier A differentiates pathology from non pathology; classifier Bthen determines whether the pathology is colorectal cancer or notcolorectal cancer). In another embodiment, a formula can be generatedwhich results from weighting the results of more than one classifier.For example, the results of each classifier can be given a score of 1and an indication of probability of a test subject having one or morecolorectal pathologies is the result of the aggregate score of each ofthe selected classifiers of a given formula. Other possible combinationsand weightings of classifiers would be understood and are encompassedherein.

Classifiers generated can be used to test an unknown or test subject. Inone embodiment, the results from equations generated by logisticregression to answer the question does an individual have one or morecolorectal pathologies or is an individual “normal.” In yet anotherembodiment of the invention, the answer to the question above may be ananswer of non-determinable.

In one embodiment of the invention, each classifier is evaluated for itsability to properly characterize each individual of the trainingpopulation using those methods known to a person skilled in the art. Forexample one can evaluate the classifier using cross validation, LeaveOne out Cross Validation (LOOCV), n-fold cross validation, jackknifeanalysis using standard statistical methods as disclosed. In anotherembodiment of the invention, each classifier is evaluated for itsability to properly characterize those individuals of the trainingpopulation which were not used to generate the classifier.

In one embodiment, the method used to evaluate the classifier for itsability to properly characterize each individual of the trainingpopulation is a method which evaluates the classifier's sensitivity(TPF, true positive fraction) and 1-specificity (TNF, true negativefraction). In one embodiment, the method used to test the classifier isReceiver Operating Characteristic (“ROC”) which provides severalparameters to evaluate both the sensitivity and specificity of theresult of the equation generated. In one embodiment using the ReceiverOperating Characteristic (“ROC”) the ROC area (area under the curve) isused to evaluate the equations. A ROC area greater than 0.5, 0.6, 0.7,0.8, 0.9 is preferred. A perfect ROC area score of 1.0 indicates withboth 100% sensitivity and 100% specificity. In some embodimentsclassifiers are selected on the basis of the score. For example, wherethe scoring system used is receiver operating characteristic (ROC) curvescore determined by an area under the ROC curve, in some embodiments,those classifiers with scores of greater than 0.95, 0.9, 0.85, 0.8, 0.7,0.65, 0.6, 0.55 0.5 or 0.45 are chosen. In other embodiments, wherespecificity is important to the use of the classifier, a sensitivitythreshold can be set and classifiers ranked on the basis of thespecificity chosen. For example classifiers with a cutoff forspecificity of greater than 0.95, 0.9, 0.85, 0.8, 0.7, 0.65, 0.6, 0.550.5 or 0.45 can be chosen. Similarly, the specificity threshold can beset and classifiers ranked on the basis of sensitivity greater than0.95, 0.9, 0.85, 0.8, 0.7, 0.65, 0.6, 0.55 0.5 or 0.45 can be chosen.Thus in some embodiments, only the top 10 ranking classifiers, the top20 ranking classifiers, or the top 100 ranking classifiers are selected.

As would be understood by a person skilled in the art, the utility ofthe combinations and classifiers determined by a mathematical model willdepend upon the phenotypes of the populations used to generate the datafor input into the model. Examples of specific embodiments are describedmore thoroughly herein.

(H) Populations for Input into the Mathematical Models

Populations used for input should be chosen so as to result instatistically significant resulting classifier. In some embodiments, thereference or training population includes between 10 and 30 subjects. Inanother embodiment the reference population contains between 30-50subjects. In still other embodiments, the reference population includestwo or more populations each containing between 50 and 100, 100 and 500,between 500 and 1000, or more than 1000 subjects. The referencepopulation includes two or more subpopulations. In a preferredembodiment, the phenotypic trait characteristics of the subpopulationsare similar but for the diagnosis with respect to the presence of one ormore colorectal pathologies, for example the distribution within thesubpopulations are similar with regards to the age and sex of thesubpopulations. It is also preferred that the subpopulations are ofroughly equivalent numbers. It is to be understood that the methodsherein do not require using data from every member of a population, butinstead may rely on data from a subset of a population in question.

For example, for a reference or test population for input into amathematical model to identify those biomarkers which are useful inidentifying an individual as having any polyps or not having any polyps,the reference population is comprised of individuals having polyps (thefirst subpopulation), and individuals not having polyps (the secondsubpopulation). For purposes of characterizing the subpopulations ashaving or not having polyps, any verified method can be used includingdigital rectal examination, fecal occult blood testing, rigidsigmoidoscopy, flexible sigmoidoscopy, double-contrast barium enema,colonoscopy, and histological examination. Preferably only thoseindividuals whose diagnoses are certain are utilized as part of thereference population.

In another embodiment, to identify those biomarkers which are useful inidentifying an individual as having high risk polyps or not, thereference population is comprised of individuals having high risk polyps(the first subpopulation), and individuals not having high risk polyps(the second subpopulation) where high risk polyps are the following:Tubulovillous Adenoma, Villous Adenoma, Cancer High Grade Dysplasia andTubular Adenoma where the Tubular Adenoma is greater than 10 mm. Forpurposes of characterizing the subpopulations as having or not havinghigh risk polyps, any verified method can be used including digitalrectal examination, fecal occult blood testing, rigid sigmoidoscopy,flexible sigmoidoscopy, double-contrast barium enema, colonoscopy, andhistological examination.

In yet another embodiment, to test biomarkers which are useful inidentifying an individual as having early stage of colorectal cancer ornot, the reference population can, for example be comprised ofindividuals having localized colorectal cancer as compared withindividuals with other types of colorectal cancer (e.g., late stage).

In another embodiment, to identify those biomarkers which are useful inidentifying an individual as having high risk polyps or not, thereference population is comprised of individuals having high risk polyps(the first subpopulation), and individuals not having high risk polyps(the second subpopulation) where high risk polyps are the following:Tubulovillous Adenoma; Villous Adenoma; Cancer; High Grade Dysplasia;and Tubular Adenoma.

(I) Data for Input into the Mathematical Models to Identify Classifiersfor Testing for Colorectal Pathology

Data for input into the mathematical models is data representative ofthe level of the products of the biomarkers of the invention. As suchthe data is the measure of the level of expression of the products ofthe biomarkers of the invention including either mRNA and/or protein.

In one embodiment of the invention, the RNA products of the biomarkersof the invention which are measured are the population of RNA productsincluding the mRNA, and all of the spliced variants of the mRNA. Inanother embodiment of the invention the products measured are all of themRNA products expressed in blood. In yet another embodiment of theinvention, the products measured include one or more specific splicedvariants of the mRNA which are expressed in blood. In yet anotherembodiment of the invention, the products measured are the RNA productslisted in Table 3 or Table 13.

Protein products of the biomarkers of the invention are also includedwithin the scope of the invention. To practice the invention,measurement of the protein products of the biomarkers of the inventioncan be used for purposes of testing for one or more colorectalpathologies. More particularly, measurement of those populations ofprotein products of the biomarkers which are differentially expressed inindividuals having or not having any polyps are useful for purposes oftesting and are encompassed herein.

In one embodiment of the invention the protein products are thosetranslated from the biomarkers listed in Table 1, Table 2, Table 11, orTable 12. In another embodiment, the protein products are those whichare expressed in blood. In yet another embodiment of the invention, theprotein products are those corresponding to the proteins listed in Table3 or Table 13.

In yet another embodiment, data reflective of the level of expression ofa combination of protein products and RNA products of the biomarkers areused. As would be understood by a person skilled in the art, othercombinations of input data can be utilized to generate classifiersuseful in accordance with the invention.

In other embodiments, as would be understood by one of ordinary skill inthe art, data reflective of each biomarker in each member of thepopulation is not necessary so long as there are data for sufficientmembers of each reference population to permit creation of a classifier.For example, data representative of biomarkers in 99%, 95%, 90%, 85%,80%, or 75% of members of a population may suffice in givencircumstances.

(J) Mathematical Models

Formulae for use with the methods described herein may generally havethe form:

V=C+Σβ _(i)ƒ(X _(i))+Σβ_(ij)ƒ(X _(i) , X _(j))+Σβ_(ijk)ƒ(X _(i) ,X _(j),X _(k))+ . . .

Wherein V is a value indicating the probability that a test subject hasone or more colorectal pathologies, X_(i) is a level of one or moreproducts of an ith biomarker in a sample from the test subject, βi is acoefficient for a term involving only the ith biomarker, β_(ij) is acoefficient for a term that is a function of the ith and jth biomarkers,β_(ijk) is a coefficient for a term that is a function of the ith, jthand kth biomarkers, and C is a constant. Still other terms may findthemselves in this formula, such as terms depending on four or morebiomarkers.

By ‘indicates’ is meant that V might be an actual probability (a numbervarying between 0 and 1), or V might be a quantity from which aprobability can be readily derived.

There are various forms of functions ƒ(X_(i), X_(j), . . . ) that dependon expression levels of the various biomarkers. For example, thefunctions may be polynomials in those expression levels, i.e., involveproducts of the various biomarkers raised to numeric powers. Examplesinclude: X_(i)X_(j) ², X_(i)X_(j)X_(k), (X_(i)X_(j))^(1/3),X_(i)X_(j)+X_(i)X_(k). The functions may additionally or alternativelyinvolve logarithms, exponentials, or still other functions of theexpression levels.

In certain embodiments, the ƒ(X_(i), X_(j), . . . ) depend on ratios ofthe biomarker expression levels, i.e., ƒ(X_(i), X_(j))=X_(i)/X_(j).

Regression Models

In some embodiments the expression data for some or all of thebiomarkers identified in the present invention are used in a regressionmodel, such as but not limited to a logistic regression model or alinear regression model, so as to identify classifiers useful indiagnosing one or more colorectal pathologies. The regression model isused to test various combinations of two or more of the biomarkersidentified in Table 1, Table 2, Table 11, or Table 12 to generateclassifiers. In the case of regression models, the classifiers whichresult are in the form of equations which provide a dependent variableY, which represents the presence or absence of a given phenotype wherethe data representing the expression of each of the biomarkers in theequation is multiplied by a weighted coefficient as generated by theregression model. The classifiers generated can be used to analyzeexpression data from a test subject and provide a result indicative ofthe probability of a test subject having one or more colorectalpathologies. In general, a multiple regression equation of interest canbe written

Y=α+β ₁ X ₁+β₂ X ₂+ . . . +β_(k) X _(k)+ε

where Y, the dependent variable, indicates presence (when Y is positive)or absence (when Y is negative) of the biological feature (e.g., absenceor presence of one or more colorectal pathologies) associated with thefirst subgroup. This model says that the dependent variable Y depends onk explanatory variables (the measured characteristic values for the kselect genes (e.g., the biomarkers) from subjects in the first andsecond subgroups in the reference population), plus an error term thatencompasses various unspecified omitted factors. In the above-identifiedmodel, the parameter β₁ gauges the effect of the first explanatoryvariable X₁ on the dependent variable Y (e.g., a weighting factor),holding the other explanatory variables constant. Similarly, β₂ givesthe effect of the explanatory variable X₂ on Y, holding the remainingexplanatory variables constant.

A logistic regression model is a non-linear transformation of the linearregression. The logistic regression model is often referred to as the“logit” model and can be expressed as

ln [p/(1−p)]=α+β₁ X ₁+β₂ X ₂+ . . . +β_(k) X _(k)+ε or

[p/(1−p)]=exp^(α) exp^(β) ¹ ^(X) ¹ exp^(β) ² ^(X) ² × . . . ×exp^(β)^(k) ^(X) ^(k) exp^(ε)

where,

α and ε are constants

In is the natural logarithm, log_(e), where e=2.71828 . . . ,

p is the probability that the event Y occurs, p(Y=1),

p/(1−p) is the “odds ratio”,

ln [p/(1−p)] is the log odds ratio, or “logit”, and

all other components of the model are the same as the general linearregression equation described above. It will be appreciated by those ofskill in the art that the term for α and ε can be folded into a singleconstant. Indeed, in preferred embodiments, a single term is used torepresent α and ε. The “logistic” distribution is an S-shapeddistribution function. The logit distribution constrains the estimatedprobabilities (p) to lie between 0 and 1.

In some embodiments of the present invention, the logistic regressionmodel is fit by maximum likelihood estimation (MLE). In other words, thecoefficients (e.g., α, β₁, β₂, . . . ) are determined by maximumlikelihood. A likelihood is a conditional probability (e.g., P(Y|X), theprobability of Y given X). The likelihood function (L) measures theprobability of observing the particular set of dependent variable values(Y₁, Y₂, . . . , Y_(n)) that occur in the sample data set. It is writtenas the probability of the product of the dependent variables:

L=Prob(Y ₁ *Y ₂ * * * Y _(n))

The higher the likelihood function, the higher the probability ofobserving the Ys in the sample. MLE involves finding the coefficients(α, β₁, β₂, . . . ) that makes the log of the likelihood function (LL<0)as large as possible or −2 times the log of the likelihood function(−2LL) as small as possible. In MLE, some initial estimates of theparameters α, β₁, β₂, . . . are made. Then the likelihood of the datagiven these parameter estimates is computed. The parameter estimates areimproved the likelihood of the data is recalculated. This process isrepeated until the parameter estimates do not change much (for example,a change of less than 0.01 or 0.001 in the probability). Examples oflogistic regression and fitting logistic logistic regression models arefound in Hastie, The Elements of Statistical Learning, Springer, NewYork, 2001, pp. 95-100 which is incorporated herein in its entirety.

Neural Networks

In another embodiment, the expression measured for each of thebiomarkers of the present invention can be used to train a neuralnetwork. A neural network is a two-stage regression or classificationmodel. A neural network can be binary or non binary. A neural networkhas a layered structure that includes a layer of input units (and thebias) connected by a layer of weights to a layer of output units. Forregression, the layer of output units typically includes just one outputunit. However, neural networks can handle multiple quantitativeresponses in a seamless fashion. As such a neural network can be appliedto allow identification of biomarkers which differentiate as betweenmore than two populations (ie more than two phenotypic traits). In onespecific example, a neural network can be trained using expression datafrom the products of the biomarkers in Table 1, Table 2, Table 11, orTable 12 to identify those combinations of biomarkers which are specificfor one or more colorectal pathologies. As a result, the trained neuralnetwork can be used to directly identify combinations of biomarkersuseful to test for one or more colorectal pathologies. In someembodiments, the back-propagation neural network (see, for example Abdi,1994, “A neural network primer”, J. Biol System. 2, 247-283) containinga single hidden layer of ten neurons (ten hidden units) found inEasyNN-Plus version 4.0 g software package (Neural Planner SoftwareInc.) is used.

Neural networks are described in Duda et al., 2001, PatternClassification, Second Edition, John Wiley & Sons, Inc., New York; andHastie et al., 2001, The Elements of Statistical Learning,Springer-Verlag, New York which is incorporated herein in its entirety.

Singular Value Decomposition (SVD) and Principal Component Analysis(PCA)

Singular value decomposition (SVD) and Principal Component Analysis(PCA) are common techniques for analysis of multivariate data, and wehave found that gene expression data is well suited to analysis usingSVD/PCA. SVD or equivalently, in this case, PCA, is defined as follows:

Singular value decomposition (SVD) and Principal Component Analysis(PCA) are common techniques for analysis of multivariate data, and wehave found gene expression data are well suited to analysis usingSVD/PCA. SVD or equivalently, in this case, PCA, is defined as follows:

Let G be an m×n gene expression matrix with rank r, and m≧n, andtherefore r≦n where m is a row and n is a column of data of the matrix.In the case of microarray data, gij is the level of one or more productsof the ith biomarker in the jth assay. The elements of the ith row of Gform the n-dimensional vector bi (where b is a biomarker), which werefer to as the transcriptional response of the ith biomarker.Alternatively, the elements of the jth column of G form them-dimensional vector aj, which we refer to as the expression profile (orgene expression profile) of the jth assay.

The equation for singular value decomposition of G is the following:

G=USV ^(T)

where U is an m×n matrix, S is an n×n diagonal matrix, and VT is also ann×n matrix. The columns of U are called the left singular vectors, {uk},and form an orthonormal basis for the assay expression profiles, so thatu_(i)·u_(j)=1 for i=j, and u_(i)·u_(j)=0 otherwise. The rows of V^(T)contain the elements of the right singular vectors, {v_(k)}, and form anorthonormal basis for the gene transcriptional responses. The elementsof S are only nonzero on the diagonal, and are called the singularvalues. Thus, S=diag(s₁, . . . , s_(n)). Furthermore, s_(k)>0 for 1≦k≦r,and s_(i)=0 for (r+1)≦k≦n. By convention, the ordering of the singularvectors is determined by high-to-low sorting of singular values, withthe highest singular value in the upper left index of the S matrix. Notethat for a square, symmetric matrix X, singular value decomposition isequivalent to diagonalization, or solution of the eigenvalue problem.

Other Mathematical Models

The pattern classification and statistical techniques described aboveare merely examples of the types of models that can be used to constructclassifiers useful for diagnosing or detecting one or more colorectalpathologies, for example clustering as described on pages 211-256 ofDuda and Hart, Pattern Classification and Scene Analysis, 1973, JohnWiley & Sons, Inc., New York, incorporated herein by reference in itsentirety; Principal component analysis, (see for Jolliffe, 1986,Principal Component Analysis, Springer, New York, incorporated herein byreference); nearest neighbour classifier analysis, (see for exampleDuda, Pattern Classification, Second Edition, 2001, John Wiley & Sons,Inc; and Hastie, 2001, The Elements of Statistical Learning, Springer,New York); linear discriminant analysis, (see for example Duda, PatternClassification, Second Edition, 2001, John Wiley & Sons, Inc; andHastie, 2001, The Elements of Statistical Learning, Springer, New York;Venables & Ripley, 1997, Modern Applied Statistics with s-plus,Springer, New York); Support Vector Machines (see, for example,Cristianini and Shawe-Taylor, 2000, An Introduction to Support VectorMachines, Cambridge University Press, Cambridge, Boser et al., 1992, “Atraining algorithm for optimal margin classifiers, in Proceedings of the5^(th) Annual ACM Workshop on Computational Learning Theory, ACM Press,Pittsburgh, Pa., pp. 142-152; Vapnik, 1998, Statistical Learning Theory,Wiley, New York, incorporated herein by reference.)

Computer Implementation

The methods described herein are preferably performed by a suitablyprogrammed computer. The computer system for use with the methodsdescribed herein, as further described herein, is configured to acceptand to process data and may be a single-processor or multi-processorcomputer system. Examples of suitable computer systems include, but arenot limited to, any one of various combinations of mainframe computers,minicomputers, personal computers, laptop computers, notebook computers,hand-held computers, personal digital assistants, mobile phones, set-topboxes, microprocessor-based consumer electronics, programmable consumerelectronics, and the like. Additionally, the methods of the inventionmay be practiced on networked computers, CPU-clusters, workstations, andso-called mainframe computers. The computer system may be a locallyaccessed computer, a remotely accessed computer system (e.g., server),or a combination of both. Depending on the application and purpose, thecomputer system may have access or be accessible to “the internet”[World Wide Web (WWW)]. It will be appreciated that the computer systemmay be a stand-alone system or a distributed system comprising multipledevices communicating with each other through a network. Depending onthe application and purpose, the computer system may be a static ormobile computer system. One of ordinary skill in the art will possessthe necessary knowledge and skills for selecting, obtaining andutilizing a suitable computer system for practicing any aspect of theinvention.

It is therefore consistent with the description herein that variousmethods and formulae are implemented, in the form of computer programinstructions, and executed on a computer as also described herein.Suitable programming languages for expressing the program instructionsinclude, but are not limited to, one or more languages selected from thegroup consisting of: C, C++, an embodiment of FORTRAN such as FORTRAN77or FORTRAN90, Java, Visual Basic, Perl, Tcl/Tk, JavaScript, and ADA. Itis to be understood that various aspects of the methods may be writtenin different computing languages from one another, where such languagesare preferred for particular applications, and the various aspects arecaused to communicate with one another by appropriate system-level-toolsavailable on a given computer.

The computer program instructions are stored in a computer memory duringexecution, and may additionally be stored on any of various forms ofcomputer-readable media known in the art, such as, but not limited to,CD-Rom, CD-R, CD-RW, flash memory, memory cards, memory sticks, DVD-Rom,USB-sticks, optical discs, or high capacity network storage drives. Itis thus consistent with ordinary practice of the present invention thatthe computer program instructions can be delivered to a user on atransferable medium such as a CD-Rom, and also delivered over a computernetwork, such as by downloading over the Internet through aweb-interface.

FIG. 1 shows a schematic of a general-purpose computer system 100suitable for practicing the methods described herein. The computersystem 100, shown as a self-contained unit but not necessarily solimited, comprises at least one data processing unit (CPU) 102, a memory104, which will typically include both high speed random access memoryas well as non-volatile memory (such as one or more magnetic diskdrives) but may be simply flash memory, a user interface 108, optionallya disk 110 controlled by a disk controller 112, and at least oneoptional network or other communication interface card 114 forcommunicating with other computers as well as other devices. At leastthe CPU 102, memory 104, user interface 108, disk controller wherepresent, and network interface card, communicate with one another via atleast one communication bus 106.

Memory 104 stores procedures and data, typically including: an operatingsystem 140 for providing basic system services; application programs 152such as user level programs for viewing and manipulating data,evaluating formulae for the purpose of diagnosing a test subject;authoring tools for assisting with the writing of computer programs; afile system 142, a user interface controller 144 for handlingcommunications with a user via user interface 108, and optionally one ormore databases 146 for storing microarray data and other information,optionally a graphics controller 148 for controlling display of data,and optionally a floating point coprocessor 150 dedicated to carryingout mathematical operations. The methods of the present invention mayalso draw upon functions contained in one or more dynamically linkedlibraries, not shown in FIG. 1, but stored either in Memory 104, or ondisk 110, or accessible via network interface connection 114.

User interface 108 may comprise a display 128, a mouse 126, and akeyboard 130. Although shown as separate components in FIG. 1, one ormore of these user interface components can be integrated with oneanother in embodiments such as handheld computers. Display 128 may be acathode ray tube (CRT), or flat-screen display such as an LCD based onactive matrix or TFT embodiments, or may be an electroluminescentdisplay, based on light emitting organic molecules such as conjugatedsmall molecules or polymers. Other embodiments of a user interface notshown in FIG. 1 include, e.g., several buttons on a keypad, acard-reader, a touch-screen with or without a dedicated touching device,a trackpad, a trackball, or a microphone used in conjunction withvoice-recognition software, or any combination thereof, or asecurity-device such as a fingerprint sensor or a retinal scanner thatprohibits an unauthorized user from accessing data and programs storedin system 100.

System 100 may also be connected to an output device such as a printer(not shown), either directly through a dedicated printer cable connectedto a serial or USB port, or wirelessly, or via a network connection.

The database 146 may instead, optionally, be stored on disk 110 incircumstances where the amount of data in the database is too great tobe efficiently stored in memory 104. The database may also instead, orin part, be stored on one or more remote computers that communicate withcomputer system 100 through network interface connection 114.

The network interface 134 may be a connection to the internet or to alocal area network via a cable and modem, or ethernet, firewire, or USBconnectivity, or a digital subscriber line. Preferably the computernetwork connection is wireless, e.g., utilizing CDMA, GSM, or GPRS, orbluetooth, or standards such as 802.11a, 802.11b, or 802.11g.

It would be understood that various embodiments and configurations anddistributions of the components of system 10 across different devicesand locations are consistent with practice of the methods describedherein. For example, a user may use a handheld embodiment that acceptsdata from a test subject, and transmits that data across a networkconnection to another device or location wherein the data is analyzedaccording to a formulae described herein. A result of such an analysiscan be stored at the other location and/or additionally transmitted backto the handheld embodiment. In such a configuration, the act ofaccepting data from a test subject can include the act of a userinputting the information. The network connection can include aweb-based interface to a remote site at, for example, a healthcareprovider. Alternatively, system 10 can be a device such as a handhelddevice that accepts data from the test subject, analyzes the data, suchas by inputting the data into a formula as further described herein, andgenerating a result that is displayed to the user. The result can thenbe, optionally, transmitted back to a remote location via a networkinterface such as a wireless interface. System 100 may further beconfigured to permit a user to transmit by e-mail results of an analysisdirectly to some other party, such as a healthcare provider, or adiagnostic facility, or a patient.

(K) Use of the Biomarkers of the Invention for Testing, Screening orDiagnosing a Test Subject

As would be understood by a person skilled in the art, theidentification of one or more biomarkers can be used to allow for thetesting, screening or diagnosis of one or more colorectal pathologiesincluding polyps or one or more subtypes of polyps within a test subjectby measuring the expression of the products of the biomarkers (gene) inthe test subject (the “test subject”).

In one embodiment, the results from the test subject are compared withthe a control wherein the control can be results from one or moreindividuals having colorectal pathology, having polyps, having one ormore subtypes of polyps and/or one or more individuals not having anycolorectal pathology, not having any polyps or not having one or morespecific subtypes of colorectal polyps.

In another embodiment, one can input data reflective of the expressionof the products of the biomarkers of the test subject into a formula ofthe invention resulting in a determination of whether said test subjecthas one or more colorectal pathologies. It is not necessary to use thesame formula used to test the biomarker combination for its ability totest for colorectal pathologies as to diagnose an individual using thebiomarker combination identified. Data representative of the products ofthe biomarkers of the invention (including RNA and/or Protein) is inputinto a formula of the invention so as to determine a probability of atest subject having one or more colorectal pathologies. The data can begenerated using any technique known to measure the level of expressionof either the RNA and protein products of the biomarkers of theinvention.

In one embodiment, use of the formula results in a determination ofwhether the test subject has polyps or does not have polyps. Forexample, using logistic regression as the model, Y is used as apredictor of polyps, where when Y>0 a person is diagnosed as havingpolyps and where Y<0, a person is diagnosed as not having polyps. In yetanother embodiment, one can also include a third category of predictionwherein diagnosis is indeterminable. For example, one can determine thestandard deviation inherent within the methodology used to measure geneexpression of the biomarkers (δ). If Y<δ but >0 or Y>−δ but <0, then thetest results are considered indeterminable.

(L) Polynucleotides Used to Measure the Products of the Biomarkers ofthe Invention

Polynucleotides capable of specifically or selectively binding to theRNA products of the biomarkers of the invention are used to measure thelevel of expression of the biomarkers. For example: oligonucleotides,cDNA, DNA, RNA, PCR products, synthetic DNA, synthetic RNA, or othercombinations of naturally occurring or modified nucleotides whichspecifically and/or selectively hybridize to one or more of the RNAproducts of the biomarker of the invention are useful in accordance withthe invention.

In a preferred embodiment, the oligonucleotides, cDNA, DNA, RNA, PCRproducts, synthetic DNA, synthetic RNA, or other combinations ofnaturally occurring or modified nucleotides oligonucleotides which bothspecifically and selectively hybridize to one or more of the RNAproducts of the biomarker of the invention are used.

(M) Techniques to Measure the RNA Products of the Biomarkers of theInvention Array Hybridization

In one embodiment of the invention, the polynucleotide used to measurethe RNA products of the biomarkers of the invention can be used asnucleic acid members localized on a support to comprise an arrayaccording to one aspect of the invention. The length of a nucleic acidmember can range from 8 to 1000 nucleotides in length and are chosen soas to be specific for the RNA products of the biomarkers of theinvention. In one embodiment, these members are selective for the RNAproducts of the biomarkers of the invention. The nucleic acid membersmay be single or double stranded, and/or may be oligonucleotides or PCRfragments amplified from cDNA. In some embodiments oligonucleotides areapproximately 20-30 nucleotides in length. ESTs are in some embodiments100 to 600 nucleotides in length. It will be understood to a personskilled in the art that one can utilize portions of the expressedregions of the biomarkers of the invention as a probe on the array. Moreparticularly oligonucleotides complementary to the genes of theinvention and or cDNA or ESTs derived from the genes of the inventionare useful. For oligonucleotide based arrays, the selection ofoligonucleotides corresponding to the gene of interest which are usefulas probes is well understood in the art. More particularly it isimportant to choose regions which will permit hybridization to thetarget nucleic acids. Factors such as the Tm of the oligonucleotide, thepercent GC content, the degree of secondary structure and the length ofnucleic acid are important factors. See for example U.S. Pat. No.6,551,784.

As described, microarrays can be used to identify and select genesdifferentially expressed in individuals having or not having one or morecolorectal pathologies, one or more polyps or one or more subtypes ofpolyps, and can be used to diagnose or detect polyps or one or moresubtypes of polyps using the biomarkers of the invention. Genesidentified as differentially expressed using microarrays can be seen inTable 1, and Table 11.

Construction of a Nucleic Acid Array

In the subject methods, an array of nucleic acid members stablyassociated with the surface of a substantially support is contacted witha sample comprising target nucleic acids under hybridization conditionssufficient to produce a hybridization pattern of complementary nucleicacid members/target complexes in which one or more complementary nucleicacid members at unique positions on the array specifically hybridize totarget nucleic acids. The identity of target nucleic acids whichhybridize can be determined with reference to location of nucleic acidmembers on the array.

The nucleic acid members may be produced using established techniquessuch as polymerase chain reaction (PCR) and reverse transcription (RT).These methods are similar to those currently known in the art (see e.g.,PCR Strategies, Michael A. Innis (Editor), et al. (1995) and PCR:Introduction to Biotechniques Series, C. R. Newton, A. Graham (1997)).Amplified nucleic acids are purified by methods well known in the art(e.g., column purification or alcohol precipitation). A nucleic acid isconsidered pure when it has been isolated so as to be substantially freeof primers and incomplete products produced during the synthesis of thedesired nucleic acid. In some embodiments, a purified nucleic acid willalso be substantially free of contaminants which may hinder or otherwisemask the specific binding activity of the molecule.

An array, according to one aspect of the invention, comprises aplurality of nucleic acids attached to one surface of a support at adensity exceeding 20 different nucleic acids/cm², wherein each of thenucleic acids is attached to the surface of the support in anon-identical pre-selected region (e.g., a microarray). Each associatedsample on the array comprises a nucleic acid composition, of knownidentity, usually of known sequence, as described in greater detailbelow. Any conceivable substrate may be employed in the invention.

In one embodiment, the nucleic acid attached to the surface of thesupport is DNA. In a preferred embodiment, the nucleic acid attached tothe surface of the support is cDNA or RNA.

In another preferred embodiment, the nucleic acid attached to thesurface of the support is cDNA synthesized by polymerase chain reaction(PCR). In some embodiments, a nucleic acid member in the array,according to the invention, is at least 10, 25 or 50 nucleotides inlength. In one embodiment, a nucleic acid member is at least 150nucleotides in length. In some embodiments, a nucleic acid member isless than 1000 nucleotides in length. More preferably, a nucleic acidmember is less than 500 nucleotides in length.

In the arrays of the invention, the nucleic acid compositions are stablyassociated with the surface of a support. In one embodiment, the supportmay be a flexible or rigid support. By “stably associated” is meant thateach nucleic acid member maintains a unique position relative to thesupport under hybridization and washing conditions. As such, the samplesare non-covalently or covalently stably associated with the supportsurface. Examples of non-covalent association include non-specificadsorption, binding based on electrostatic interactions (e.g., ion pairinteractions), hydrophobic interactions, hydrogen bonding interactions,specific binding through a specific binding pair member covalentlyattached to the support surface, and the like. Examples of covalentbinding include covalent bonds formed between the nucleic acids and afunctional group present on the surface of the rigid support (e.g.,—OH), where the functional group may be naturally occurring or presentas a member of an introduced linking group, as described in greaterdetail below

The amount of nucleic acid present in each composition will besufficient to provide for adequate hybridization and detection of targetnucleic acid sequences during the assay in which the array is employed.Generally, the amount of each nucleic acid member stably associated withthe support of the array is at least about 0.001 ng, preferably at leastabout 0.02 ng and more preferably at least about 0.05 ng, where theamount may be as high as 1000 ng or higher, but will usually not exceedabout 20 ng. Where the nucleic acid member is “spotted” onto the supportin a spot comprising an overall circular dimension, the diameter of the“spot” will generally range from about 10 to 5,000 μm, usually fromabout 20 to 2,000 μm and more usually from about 100 to 200 μm.

Control nucleic acid members may be present on the array includingnucleic acid members comprising oligonucleotides or nucleic acidscorresponding to genomic DNA, housekeeping genes, vector sequences,plant nucleic acid sequence, negative and positive control genes, andthe like. Control nucleic acid members are calibrating or control geneswhose function is not to tell whether a particular “key” gene ofinterest is expressed, but rather to provide other useful information,such as background or basal level of expression.

Other control nucleic acids are spotted on the array and used as targetexpression control nucleic acids and mismatch control nucleotides tomonitor non-specific binding or cross-hybridization to a nucleic acid inthe sample other than the target to which the probe is directed.Mismatch probes thus indicate whether a hybridization is specific ornot. For example, if the target is present, the perfectly matched probesshould be consistently brighter than the mismatched probes. In addition,if all control mismatches are present, the mismatch probes are used todetect a mutation.

Use of a Microarray

Nucleic acid arrays according to the invention can be used to assaynucleic acids in a sample comprising one or more target nucleic acidsequences (ie such as RNA products of the biomarkers of the invention).The arrays of the subject invention can be used for testing, screening,and/or diagnosis of one or more colorectal pathologies including polypsor one or more subtypes of polyps, or screening for therapeutic targetsand the like.

The arrays are also useful in broad scale expression screening for drugdiscovery and research, such as the effect of a particular active agenton the expression pattern of biomarkers of the invention, where suchinformation is used to reveal drug efficacy and toxicity, environmentalmonitoring, disease research and the like.

Arrays can be made using at least one, more preferably a combination ofthese sequences, as a means of diagnosing colon pathology or one or moresubtypes of colon pathology.

The choice of a standard sample would be well understood by a personskilled in the art, and would include a sample complementary to RNAisolated from one or more normal individuals, wherein a normalindividual is an individual not having polyps.

Preparation of Nucleic Acid Sample for Hybridization to an Array

The samples for hybridization with the arrays according to the inventionare in some embodiments derived from total RNA from blood. In anotherembodiment, targets for the arrays are derived from mRNA from blood.

The nucleic acid sample is capable of binding to a nucleic acid memberof complementary sequence through one or more types of chemical bonds,usually through complementary base pairing, usually through hydrogenbond formation.

As used herein, a “nucleic acid derived from an mRNA transcript: or a“nucleic acid corresponding to an mRNA” refers to a nucleic acid forwhich synthesis of the mRNA transcript or a sub-sequence thereof hasultimately served as a template. Thus, a cDNA reverse transcribed froman mRNA, an RNA transcribed from that cDNA, a DNA amplified from thecDNA, an RNA transcribed from the amplified DNA, etc., are all derivedfrom or correspond to the mRNA transcript and detection of such derivedor corresponding products is indicative of or proportional to thepresence and/or abundance of the original transcript in a sample. Thus,suitable nucleic acid samples include, but are not limited to, mRNAtranscripts of a gene or genes, cDNA reverse transcribed from the mRNA,cRNA transcribed from the cDNA, DNA amplified from a gene or genes, RNAtranscribed from amplified DNA, and the like. The nucleic acid samplesused herein are in some embodiments derived from blood. Nucleic acidscan be single- or double-stranded DNA, RNA, or DNA-RNA hybridssynthesized from human blood using methods known in the art, forexample, reverse transcription or PCR.

In the simplest embodiment, such a nucleic acid sample comprises totalmRNA or a nucleic acid sample corresponding to mRNA (e.g., cDNA)isolated from blood samples. In another embodiment, total mRNA isisolated from a given sample using, for example, an acidguanidinium-phenol-chloroform extraction method and polyA+mRNA isisolated by oligo dT column chromatography or by using (dT)n magneticbeads (see, e.g., Sambrook et al., Molecular Cloning: A LaboratoryManual (2nd ed.), Vols. 1-3, Cold Spring Harbor Laboratory, (1989), orCurrent Protocols in Molecular Biology, F. Ausubel et al., ed. GreenePublishing and Wiley-Interscience, New York (1987). In a preferredembodiment, total RNA is extracted using TRIzol® reagent (GIBCO/BRL,Invitrogen Life Technologies, Cat. No. 15596). Purity and integrity ofRNA is assessed by absorbance at 260/280 nm and agarose gelelectrophoresis followed by inspection under ultraviolet light.

In some embodiments, it is desirable to amplify the nucleic acid sampleprior to hybridization, for example, when only limited amounts of samplecan be used (e.g., drop of blood). One of skill in the art willappreciate that whatever amplification method is used, if a quantitativeresult is desired, care must be taken to use a method that maintains orcontrols for the relative frequencies of the amplified nucleic acids.Methods of “quantitative” amplification are well known to those of skillin the art. For example, quantitative PCR involves simultaneouslyco-amplifying a known quantity of a control sequence using the sameprimers. This provides an internal standard that may be used tocalibrate the PCR reaction. The high density array may then includeprobes specific to the internal standard for quantification of theamplified nucleic acid. Detailed protocols for quantitative PCR areprovided in PCR Protocols, A Guide to Methods and Applications, Innis etal., Academic Press, Inc. N.Y., (1990).

Other suitable amplification methods include, but are not limited topolymerase chain reaction (PCR) (Innis, et al., PCR Protocols. A Guideto Methods and Application. Academic Press, Inc., San Diego, (1990)),ligase chain reaction (LCR) (see Wu and Wallace, 1989, Genomics, 4:560;Landegren, et al., 1988, Science, 241:1077 and Barringer, et al., 1990,Gene, 89:117, transcription amplification (Kwoh, et al., 1989, Proc.Natl. Acad. Sci. USA, 86: 1173), and self-sustained sequence replication(Guatelli, et al., 1990, Proc. Nat. Acad. Sci. USA, 87: 1874).

In a particularly preferred embodiment, the nucleic acid sample mRNA isreverse transcribed with a reverse transcriptase and a primer consistingof oligo dT and a sequence encoding the phage T7 promoter to providesingle-stranded DNA template. The second DNA strand is polymerized usinga DNA polymerase. After synthesis of double-stranded cDNA, T7 RNApolymerase is added and RNA is transcribed from the cDNA template.Successive rounds of transcription from each single cDNA templateresults in amplified RNA. Methods of in vitro transcription are wellknown to those of skill in the art (see, e.g., Sambrook, supra.) andthis particular method is described in detail by Van Gelder, et al.,1990, Proc. Natl. Acad. Sci. USA, 87: 1663-1667 who demonstrate that invitro amplification according to this method preserves the relativefrequencies of the various RNA transcripts. Moreover, Eberwine et al.Proc. Natl. Acad. Sci. USA, 89: 3010-3014 provide a protocol that usestwo rounds of amplification via in vitro transcription to achievegreater than 10⁶ fold amplification of the original starting materialthereby permitting expression monitoring even where biological samplesare limited.

Labeling of Nucleic Acid Sample or Nucleic Acid Probe

Nucleic acid samples are labelled so as to allow detection ofhybridization to an array of the invention. Any analytically detectablemarker that is attached to or incorporated into a molecule may be usedin the invention. An analytically detectable marker refers to anymolecule, moiety or atom which is analytically detected and quantified.

Detectable labels suitable for use in the present invention include anycomposition detectable by spectroscopic, photochemical, biochemical,immunochemical, electrical, optical or chemical means. Useful labels inthe present invention include biotin for staining with labeledstreptavidin conjugate, magnetic beads (e.g., Dynabeads™), fluorescentdyes (e.g., fluorescein, texas red, rhodamine, green fluorescentprotein, and the like), radiolabels (e.g., ³H, ¹²⁵I, 35S, ¹⁴C, or ³²P),enzymes (e.g., horse radish peroxidase, alkaline phosphatase and otherscommonly used in an ELISA), and colorimetric labels such as colloidalgold or colored glass or plastic (e.g., polystyrene, polypropylene,latex, etc.) beads. Patents teaching the use of such labels include U.S.Pat. Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437;4,275,149; and 4,366,241, the entireties of which are incorporated byreference herein.

Means of detecting such labels are well known to those of skill in theart. Thus, for example, radiolabels may be detected using photographicfilm or scintillation counters, fluorescent markers may be detectedusing a photodetector to detect emitted light. Enzymatic labels aretypically detected by providing the enzyme with a substrate anddetecting the reaction product produced by the action of the enzyme onthe substrate, and colorimetric labels are detected by simplyvisualizing the colored label.

The labels may be incorporated by any of a number of means well known tothose of skill in the art. However, in one embodiment, the label issimultaneously incorporated during the amplification step in thepreparation of the sample nucleic acids. Thus, for example, polymerasechain reaction (PCR) with labeled primers or labeled nucleotides willprovide a labeled amplification product. In a preferred embodiment,transcription amplification, as described above, using a labelednucleotide (e.g., fluorescein-labeled UTP and/or CTP) incorporates alabel into the transcribed nucleic acids.

Alternatively, a label may be added directly to the original nucleicacid sample (e.g., mRNA, polyA mRNA, cDNA, etc.) or to the amplificationproduct after the amplification is completed. Means of attaching labelsto nucleic acids are well known to those of skill in the art andinclude, for example, nick translation or end-labeling (e.g., with alabeled RNA) by kinasing of the nucleic acid and subsequent attachment(ligation) of a nucleic acid linker joining the sample nucleic acid to alabel (e.g., a fluorophore).

In another embodiment, the fluorescent modifications are by cyaninedyes, e.g., Cy-3/Cy-5 dUTP, Cy-3/Cy-5 dCTP (Amersham Pharmacia) or alexadyes (Khan et al., 1998, Cancer Res. 58:5009-5013).

In one embodiment, the two Nucleic Acid Sample samples used forcomparison are labeled with different fluorescent dyes which producedistinguishable detection signals, for example, nucleic acid samplesmade from normal intestinal cells are labeled with Cy5 and nucleic acidsamples made from intestinal tissue cells are labeled with Cy3. Thedifferently labeled target samples are hybridized to the same microarraysimultaneously. In a preferred embodiment, the labeled nucleic acidsamples are purified using methods known in the art, e.g., by ethanolpurification or column purification.

In another embodiment, the nucleic acid samples will include one or morecontrol molecules which hybridize to control probes on the microarray tonormalize signals generated from the microarray. In one embodiment,labeled normalization nucleic acid samples are nucleic acid sequencesthat are perfectly complementary to control oligonucleotides that arespotted onto the microarray as described above. In another embodiment,labeled normalization nucleic acid samples are nucleic acid sequencesthat are 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80% or 75% complementary tocontrol oligonucleotides that are spotted onto the microarray asdescribed above. The signals obtained from the normalization controlsafter hybridization provide a control for variations in hybridizationconditions, label intensity, “reading” efficiency and other factors thatmay cause the signal of a perfect hybridization to vary between arrays.In one embodiment, signals (e.g., fluorescence intensity) read from allother probes in the array are divided by the signal (e.g., fluorescenceintensity) from the control probes, thereby normalizing themeasurements.

Preferred normalization nucleic acid samples are selected to reflect theaverage length of the other nucleic acid samples present in the sample,however, they are selected to cover a range of lengths. Thenormalization control(s) also can be selected to reflect the (average)base composition of the other probes in the array, however, in oneembodiment, only one or a few normalization probes are used and they areselected such that they hybridize well (i.e., have no secondarystructure and do not self hybridize) and do not match any nucleic acidson the array.

Normalization probes are localized at any position in the array or atmultiple positions throughout the array to control for spatial variationin hybridization efficiency. In one embodiment, normalization controlsare located at the corners or edges of the array as well as in themiddle.

Hybridization Conditions

Nucleic acid hybridization involves providing a nucleic acid sampleunder conditions where the sample and the complementary nucleic acidmember can form stable hybrid duplexes through complementary basepairing. The nucleic acids that do not form hybrid duplexes are thenwashed away leaving the hybridized nucleic acids to be detected,typically through detection of an attached detectable label. It isgenerally recognized that nucleic acids are denatured by increasing thetemperature or decreasing the salt concentration of the buffercontaining the nucleic acids. Under low stringency conditions (e.g., lowtemperature and/or high salt) hybrid duplexes (e.g., DNA:DNA, RNA:RNA,or RNA:DNA) will form even where the annealed sequences are notperfectly complementary. Thus specificity of hybridization is reduced atlower stringency. Conversely, at higher stringency (e.g., highertemperature or lower salt) successful hybridization requires fewermismatches.

The invention provides for hybridization conditions comprising the Dighybridization mix (Boehringer); or formamide-based hybridizationsolutions, for example as described in Ausubel et al., supra andSambrook et al. supra.

Methods of optimizing hybridization conditions are well known to thoseof skill in the art (see, e.g., Laboratory Techniques in Biochemistryand Molecular Biology, Vol. 24: Hybridization With Nucleic acid Probes,P. Tijssen, ed. Elsevier, N.Y., (1993)).

Following hybridization, non-hybridized labeled or unlabeled nucleicacid is removed from the support surface, conveniently by washing,thereby generating a pattern of hybridized target nucleic acid on thesubstrate surface. A variety of wash solutions are known to those ofskill in the art and may be used. The resultant hybridization patternsof labeled, hybridized oligonucleotides and/or nucleic acids may bevisualized or detected in a variety of ways, with the particular mannerof detection being chosen based on the particular label of the testnucleic acid, where representative detection means include scintillationcounting, autoradiography, fluorescence measurement, calorimetricmeasurement, light emission measurement and the like.

Image Acquisition and Data Analysis

Following hybridization and any washing step(s) and/or subsequenttreatments, as described above, the resultant hybridization pattern isdetected. In detecting or visualizing the hybridization pattern, theintensity or signal value of the label will be not only be detected butquantified, by which is meant that the signal from each spot of thehybridization will be measured and compared to a unit valuecorresponding to the signal emitted by a known number of end labeledtarget nucleic acids to obtain a count or absolute value of the copynumber of each end-labeled target that is hybridized to a particularspot on the array in the hybridization pattern.

Methods for analyzing the data collected from hybridization to arraysare well known in the art. For example, where detection of hybridizationinvolves a fluorescent label, data analysis can include the steps ofdetermining fluorescent intensity as a function of substrate positionfrom the data collected, removing outliers, i.e., data deviating from apredetermined statistical distribution, and calculating the relativebinding affinity of the test nucleic acids from the remaining data. Theresulting data is displayed as an image with the intensity in eachregion varying according to the binding affinity between associatedoligonucleotides and/or nucleic acids and the test nucleic acids.

The following detection protocol is used for the simultaneous analysisof two samples to be compared, where each sample is labeled with adifferent fluorescent dye.

Each element of the microarray is scanned for the first fluorescentcolor. The intensity of the fluorescence at each array element isproportional to the expression level of that gene in the sample. Thescanning operation is repeated for the second fluorescent label. Theratio of the two fluorescent intensities provides a highly accurate andquantitative measurement of the relative gene expression level in thetwo samples.

In a preferred embodiment, fluorescence intensities of immobilizednucleic acid sequences were determined from images taken with a customconfocal microscope equipped with laser excitation sources andinterference filters appropriate for the Cy3 and Cy5 fluors. Separatescans were taken for each fluor at a resolution of 225 μm² per pixel and65,536 gray levels. Image segmentation to identify areas ofhybridization, normalization of the intensities between the two fluorimages, and calculation of the normalized mean fluorescent values ateach target are as described (Khan, et al., 1998, Cancer Res.58:5009-5013; Chen, et al., 1997, Biomed. Optics 2:364-374).Normalization between the images is used to adjust for the differentefficiencies in labeling and detection with the two different fluors.This is achieved by equilibrating to a value of one the signal intensityratio of a set of internal control genes spotted on the array.

In another preferred embodiment, the array is scanned in the Cy3 and Cy5channels and stored as separate 16-bit TIFF images. The images areincorporated and analysed using software which includes a griddingprocess to capture the hybridization intensity data from each spot onthe array. The fluorescence intensity and background-subtractedhybridization intensity of each spot is collected and a ratio ofmeasured mean intensities of Cy5 to Cy3 is calculated. A linearregression approach is used for normalization and assumes that a scatterplot of the measured Cy5 versus Cy3 intensities should have a slope ofone. The average of the ratios is calculated and used to rescale thedata and adjust the slope to one. A ratio of expression not equal to 1is used as an indication of differential gene expression.

In a particular embodiment, where it is desired to quantify thetranscription level (and thereby expression) of one or more nucleic acidsequences in a sample, the nucleic acid sample is one in which theconcentration of the mRNA transcript(s) of the gene or genes, or theconcentration of the nucleic acids derived from the mRNA transcript(s),is proportional to the transcription level (and therefore expressionlevel) of that gene. Similarly, it is preferred that the hybridizationsignal intensity be proportional to the amount of hybridized nucleicacid. While it is preferred that the proportionality be relativelystrict (e.g., a doubling in transcription rate results in a doubling inmRNA transcript in the sample nucleic acid pool and a doubling inhybridization signal), one of skill will appreciate that theproportionality can be more relaxed and even non-linear and stillprovide meaningful results. Thus, for example, an assay where a 5 folddifference in concentration of the sample mRNA results in a 3- to 6-folddifference in hybridization intensity is sufficient for most purposes.Where more precise quantification is required, appropriate controls arerun to correct for variations introduced in sample preparation andhybridization as described herein. In addition, serial dilutions of“standard” mRNA samples are used to prepare calibration curves accordingto methods well known to those of skill in the art. Of course, wheresimple detection of the presence or absence of a transcript is desired,no elaborate control or calibration is required.

For example, if a nucleic acid member on an array is not labeled afterhybridization, this indicates that the gene comprising that nucleic acidmember is not expressed in either sample. If a nucleic acid member islabeled with a single color, it indicates that a labeled gene wasexpressed only in one sample. The labeling of a nucleic acid membercomprising an array with both colors indicates that the gene wasexpressed in both samples. Even genes expressed once per cell aredetected (1 part in 100,000 sensitivity). A difference in expressionintensity in the two samples being compared is indicative ofdifferential expression, the ratio of the intensity in the two samplesbeing not equal to 1.0, greater than 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7,1.8, 2.0, 3.0, 4.0 and the like or less than 0.9, 0.8, 0.7, 0.6, 0.5,0.4, 0.3, 0.2 and the like.

PCR

In one aspect of the invention, the level of the expression of the RNAproducts of the biomarkers of the invention can be measured byamplifying the RNA products of the biomarkers from a sample by firstusing reverse transcription (RT). Either in combination, or as a secondreaction step, the reverse transcribed product can then be amplifiedwith the polymerase chain reaction (PCR). In accordance with oneembodiment of the invention, the PCR can be QRT-PCR as would beunderstood to a person skilled in the art.

Total RNA, or mRNA from a sample is used as a template and a primerspecific to the transcribed portion of a biomarker of the invention isused to initiate reverse transcription. Methods of reverse transcribingRNA into cDNA are well known and described in Sambrook et al., 1989,supra. Primer design can be accomplished utilizing commerciallyavailable software (e.g., Primer Designer 1.0, Scientific Softwareetc.). The product of the reverse transcription is subsequently used asa template for PCR.

PCR provides a method for rapidly amplifying a particular nucleic acidsequence by using multiple cycles of DNA replication catalyzed by athermostable, DNA-dependent DNA polymerase to amplify the targetsequence of interest. PCR requires the presence of a nucleic acid to beamplified, two single-stranded oligonucleotide primers flanking thesequence to be amplified, a DNA polymerase, deoxyribonucleosidetriphosphates, a buffer and salts.

The method of PCR is well known in the art. PCR, is performed asdescribed in Mullis and Faloona, 1987, Methods Enzymol., 155: 335, whichis incorporated herein by reference. PCR is performed using template DNA(at least 1 fg; more usefully, 1-1000 ng) and at least 25 pmol ofoligonucleotide primers. A typical reaction mixture includes: 2 ml ofDNA, 25 pmol of oligonucleotide primer, 2.5 ml of 10H PCR buffer 1(Perkin-Elmer, Foster City, Calif.), 0.4 ml of 1.25 mM dNTP, 0.15 ml (or2.5 units) of Taq DNA polymerase (Perkin Elmer, Foster City, Calif.) anddeionized water to a total volume of 25 ml. Mineral oil is overlaid andthe PCR is performed using a programmable thermal cycler.

The length and temperature of each step of a PCR cycle, as well as thenumber of cycles, are adjusted according to the stringency requirementsin effect. Annealing temperature and timing are determined both by theefficiency with which a primer is expected to anneal to a template andthe degree of mismatch that is to be tolerated. The ability to optimizethe stringency of primer annealing conditions is well within theknowledge of one of moderate skill in the art. An annealing temperatureof between 30° C. and 72° C. is used. Initial denaturation of thetemplate molecules normally occurs at between 92° C. and 99° C. for 4minutes, followed by 20-40 cycles consisting of denaturation (94-99° C.for 15 seconds to 1 minute), annealing (temperature determined asdiscussed above; 1-2 minutes), and extension (72° C. for 1 minute). Thefinal extension step is generally carried out for 4 minutes at 72° C.,and may be followed by an indefinite (0-24 hour) step at 4° C.

QRT-PCR (Quantitative real time RT-PCR), can also be performed toprovide a quantitative measure of gene expression levels. Similar toreverse transcription PCR, QRT-PCR reverse transcription and PCR can beperformed in two steps, or reverse transcription combined with PCR canbe performed concurrently. One of these techniques, for which there arecommercially available kits such as Taqman (Perkin Elmer, Foster City,Calif.), is performed with a transcript-specific antisense probe. Thisprobe is specific for the PCR product (e.g., a nucleic acid fragmentderived from a gene) and is prepared with a quencher and fluorescentreporter probe complexed to the 5′ end of the oligonucleotide. Differentfluorescent markers are attached to different reporters, allowing formeasurement of two products in one reaction. When Taq DNA polymerase isactivated, it cleaves off the fluorescent reporters of the probe boundto the template by virtue of its 5′-to-3′ exonuclease activity. In theabsence of the quenchers, the reporters now fluoresce. The color changein the reporters is proportional to the amount of each specific productand is measured by a fluorometer; therefore, the amount of each color ismeasured and the PCR product is quantified. The PCR reactions can beperformed in 96 well plates, 384 well plates and the like so thatsamples derived from many individuals are processed and measuredsimultaneously. The Taqman system has the additional advantage of notrequiring gel electrophoresis and allows for quantification when usedwith a standard curve.

A second technique useful for detecting PCR products quantitativelywithout is to use an intercolating dye such as the commerciallyavailable QuantiTect SYBR Green PCR (Qiagen, Valencia Calif.). QRT-PCRis performed using SYBR green as a fluorescent label which isincorporated into the PCR product during the PCR stage and produces aflourescense proportional to the amount of PCR product.

Both Taqman and QuantiTect SYBR systems can be used subsequent toreverse transcription of RNA. Reverse transcription can either beperformed in the same reaction mixture as the PCR step (one-stepprotocol) or reverse transcription can be performed first prior toamplification utilizing PCR (two-step protocol).

Additionally, other systems to quantitatively measure mRNA expressionproducts are known including Molecular Beacons® which uses a probehaving a fluorescent molecule and a quencher molecule, the probe capableof forming a hairpin structure such that when in the hairpin form, thefluorescence molecule is quenched, and when hybridized the flourescenseincreases giving a quantitative measurement of gene expression.

Additional techniques to quantitatively measure RNA expression include,but are not limited to, polymerase chain reaction, ligase chainreaction, Qbeta replicase (see, e.g., International Application No.PCT/US87/00880), isothermal amplification method (see, e.g., Walker etal. (1992) PNAS 89:382-396), strand displacement amplification (SDA),repair chain reaction, Asymmetric Quantitative PCR (see, e.g., U.S.Publication No. US20030134307A1) and the multiplex microsphere beadassay described in Fuja et al., 2004, Journal of Biotechnology108:193-205.

The level of gene expression can be measured by amplifying RNA from asample using transcription based amplification systems (TAS), includingnucleic acid sequence amplification (NASBA) and 3SR. See, e.g., Kwoh etal (1989) PNAS USA 86:1173; International Publication No. WO 88/10315;and U.S. Pat. No. 6,329,179. In NASBA, the nucleic acids may be preparedfor amplification using conventional phenol/chloroform extraction, heatdenaturation, treatment with lysis buffer and minispin columns forisolation of DNA and RNA or guanidinium chloride extraction of RNA.These amplification techniques involve annealing a primer that hastarget specific sequences. Following polymerization, DNA/RNA hybrids aredigested with RNase H while double stranded DNA molecules are heatdenatured again. In either case the single stranded DNA is made fullydouble stranded by addition of second target specific primer, followedby polymerization. The double-stranded DNA molecules are then multiplytranscribed by a polymerase such as T7 or SP6. In an isothermal cyclicreaction, the RNA's are reverse transcribed into double stranded DNA,and transcribed once with a polymerase such as T7 or SP6. The resultingproducts, whether truncated or complete, indicate target specificsequences.

Several techniques may be used to separate amplification products. Forexample, amplification products may be separated by agarose,agarose-acrylamide or polyacrylamide gel electrophoresis usingconventional methods. See Sambrook et al., 1989. Several techniques fordetecting PCR products quantitatively without electrophoresis may alsobe used according to the invention (see for example PCR Protocols, AGuide to Methods and Applications, Innis et al., Academic Press, Inc.N.Y., (1990)). For example, chromatographic techniques may be employedto effect separation. There are many kinds of chromatography which maybe used in the present invention: adsorption, partition, ion-exchangeand molecular sieve, HPLC, and many specialized techniques for usingthem including column, paper, thin-layer and gas chromatography(Freifelder, Physical Biochemistry Applications to Biochemistry andMolecular Biology, 2nd ed., Wm. Freeman and Co., New York, N.Y., 1982).

Another example of a separation methodology is done by covalentlylabeling the oligonucleotide primers used in a PCR reaction with varioustypes of small molecule ligands. In one such separation, a differentligand is present on each oligonucleotide. A molecule, perhaps anantibody or avidin if the ligand is biotin, that specifically binds toone of the ligands is used to coat the surface of a plate such as a 96well ELISA plate. Upon application of the PCR reactions to the surfaceof such a prepared plate, the PCR products are bound with specificity tothe surface. After washing the plate to remove unbound reagents, asolution containing a second molecule that binds to the first ligand isadded. This second molecule is linked to some kind of reporter system.The second molecule only binds to the plate if a PCR product has beenproduced whereby both oligonucleotide primers are incorporated into thefinal PCR products. The amount of the PCR product is then detected andquantified in a commercial plate reader much as ELISA reactions aredetected and quantified. An ELISA-like system such as the one describedhere has been developed by the Raggio Italgene company under the C-Tracktrade name.

Amplification products must be visualized in order to confirmamplification of the nucleic acid sequences of interest. One typicalvisualization method involves staining of a gel with ethidium bromideand visualization under UV light. Alternatively, if the amplificationproducts are integrally labeled with radio- or fluorometrically-labelednucleotides, the amplification products may then be exposed to x-rayfilm or visualized under the appropriate stimulating spectra, followingseparation.

In one embodiment, visualization is achieved indirectly. Followingseparation of amplification products, a labeled, nucleic acid probe isbrought into contact with the amplified nucleic acid sequence ofinterest. The probe in one embodiment is conjugated to a chromophore butmay be radiolabeled. In another embodiment, the probe is conjugated to abinding partner, such as an antibody or biotin, where the other memberof the binding pair carries a detectable moiety.

In another embodiment, detection is by Southern blotting andhybridization with a labeled probe. The techniques involved in Southernblotting are well known to those of skill in the art and may be found inmany standard books on molecular protocols. See Sambrook et al., 1989,supra. Briefly, amplification products are separated by gelelectrophoresis. The gel is then contacted with a membrane, such asnitrocellulose, permitting transfer of the nucleic acid and non-covalentbinding. Subsequently, the membrane is incubated with achromophore-conjugated probe that is capable of hybridizing with atarget amplification product. Detection is by exposure of the membraneto x-ray film or ion-emitting detection devices.

One example of the foregoing is described in U.S. Pat. No. 5,279,721,incorporated by reference herein, which discloses an apparatus andmethod for the automated electrophoresis and transfer of nucleic acids.The apparatus permits electrophoresis and blotting without externalmanipulation of the gel and is ideally suited to carrying out methodsaccording to the present invention.

One embodiment of the invention includes the primers and probes in Table4, 6, 16 or 17 can be used for use in measuring the expression of thebiomarkers of the invention.

Nuclease Protection Assays

In another embodiment of the invention, Nuclease protection assays(including both ribonuclease protection assays and S1 nuclease assays)can be used to detect and quantitate the RNA products of the biomarkersof the invention. In nuclease protection assays, an antisense probe(labeled with, e.g., radiolabeled or nonisotopic) hybridizes in solutionto an RNA sample. Following hybridization, single-stranded, unhybridizedprobe and RNA are degraded by nucleases. An acrylamide gel is used toseparate the remaining protected fragments. Typically, solutionhybridization is more efficient than membrane-based hybridization, andit can accommodate up to 100 μg of sample RNA, compared with the 20-30μg maximum of blot hybridizations.

The ribonuclease protection assay, which is the most common type ofnuclease protection assay, requires the use of RNA probes.Oligonucleotides and other single-stranded DNA probes can only be usedin assays containing S1 nuclease. The single-stranded, antisense probemust typically be completely homologous to target RNA to preventcleavage of the probe:target hybrid by nuclease.

Northern Blots

A standard Northern blot assay can also be used to ascertain an RNAtranscript size, identify alternatively spliced RNA transcripts, and therelative amounts of RNA products of the biomarker of the invention, inaccordance with conventional Northern hybridization techniques known tothose persons of ordinary skill in the art. In Northern blots, RNAsamples are first separated by size via electrophoresis in an agarosegel under denaturing conditions. The RNA is then transferred to amembrane, crosslinked and hybridized with a labeled probe. Nonisotopicor high specific activity radiolabeled probes can be used includingrandom-primed, nick-translated, or PCR-generated DNA probes, in vitrotranscribed RNA probes, and oligonucleotides. Additionally, sequenceswith only partial homology (e.g., cDNA from a different species orgenomic DNA fragments that might contain an exon) may be used as probes.The labeled probe, e.g., a radiolabelled cDNA, either containing thefull-length, single stranded DNA or a fragment of that DNA sequence maybe at least 20, at least 30, at least 50, or at least 100 consecutivenucleotides in length. The probe can be labeled by any of the manydifferent methods known to those skilled in this art. The labels mostcommonly employed for these studies are radioactive elements, enzymes,chemicals that fluoresce when exposed to ultraviolet light, and others.A number of fluorescent materials are known and can be utilized aslabels. These include, but are not limited to, fluorescein, rhodamine,auramine, Texas Red, AMCA blue and Lucifer Yellow. A particulardetecting material is anti-rabbit antibody prepared in goats andconjugated with fluorescein through an isothiocyanate. Proteins can alsobe labeled with a radioactive element or with an enzyme. The radioactivelabel can be detected by any of the currently available countingprocedures. Non-limiting examples of isotopes include ³H, ¹⁴C, ³²P, ³⁵S,³⁶Cl, ⁵¹Cr, ⁵⁷Co, ⁵⁸Co, ⁵⁹Fe, ⁹⁰Y, ¹²⁵I, ¹³¹I, and ¹⁸⁶Re. Enzyme labelsare likewise useful, and can be detected by any of the presentlyutilized colorimetric, spectrophotometric, fluorospectrophotometric,amperometric or gasometric techniques. The enzyme is conjugated to theselected particle by reaction with bridging molecules such ascarbodiimides, diisocyanates, glutaraldehyde and the like. Any enzymesknown to one of skill in the art can be utilized. Examples of suchenzymes include, but are not limited to, peroxidase,beta-D-galactosidase, urease, glucose oxidase plus peroxidase andalkaline phosphatase. U.S. Pat. Nos. 3,654,090, 3,850,752, and 4,016,043are referred to by way of example for their disclosure of alternatelabeling material and methods.

(N) Techniques to Measure the Protein Products of the Biomarkers of theInvention Antibody Based Methodologies

Standard techniques can also be utilized for determining the amount ofthe protein or proteins of interest present in a sample. For example,standard techniques can be employed using, e.g., immunoassays such as,for example, Western blot, immunoprecipitation followed by sodiumdodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE),immunocytochemistry, and the like to determine the amount of the proteinor proteins of interest present in a sample. A preferred agent fordetecting a protein of interest is an antibody capable of binding to aprotein of interest, in one embodiment an antibody with a detectablelabel.

For such detection methods, protein from the sample to be analyzed caneasily be isolated using techniques which are well known to those ofskill in the art. Protein isolation methods can, for example, be such asthose described in Harlow and Lane (Harlow, E. and Lane, D., Antibodies:A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold SpringHarbor, N.Y. (1988)).

In some embodiments, methods for the detection of the protein orproteins of interest involve their detection via interaction with aprotein-specific antibody. For example, antibodies directed a protein ofinterest can be utilized as described herein. Antibodies can begenerated utilizing standard techniques well known to those of skill inthe art. See, e.g., Section 15.13.2 of this application and Section 5.2of U.S. Publication No. 20040018200 for a more detailed discussion ofsuch antibody generation techniques, which is incorporated herein byreference. Briefly, such antibodies can be polyclonal, or monoclonal. Anintact antibody, or an antibody fragment (e.g., Fab or F(ab′)₂) can, forexample, be used. In some embodiments, the antibody is a human orhumanized antibody.

Table 5 and Table 15 are tables showing, in one embodiment of theinvention, antibodies which are used to detect the proteins of thebiomarkers of the invention.

For example, antibodies, or fragments of antibodies, specific for aprotein of interest can be used to quantitatively or qualitativelydetect the presence of the protein. This can be accomplished, forexample, by immunofluorescence techniques. Antibodies (or fragmentsthereof) can, additionally, be employed histologically, as inimmunofluorescence or immunoelectron microscopy, for in situ detectionof a protein of interest. In situ detection can be accomplished byremoving a histological specimen (e.g., a biopsy specimen) from apatient, and applying thereto a labeled antibody thereto that isdirected to a protein. The antibody (or fragment) can be applied byoverlaying the labeled antibody (or fragment) onto a biological sample.Through the use of such a procedure, it is possible to determine notonly the presence of the protein of interest, but also its distribution,its presence in cells (e.g., intestinal cells and lymphocytes) withinthe sample. A wide variety of well-known histological methods (such asstaining procedures) can be utilized in order to achieve such in situdetection.

Immunoassays for a protein of interest typically comprise incubating abiological sample of a detectably labeled antibody capable ofidentifying a protein of interest, and detecting the bound antibody byany of a number of techniques well-known in the art. As discussed inmore detail, below, the term “labeled” can refer to direct labeling ofthe antibody via, e.g., coupling (i.e., physically linking) a detectablesubstance to the antibody, and can also refer to indirect labeling ofthe antibody by reactivity with another reagent that is directlylabeled. Examples of indirect labeling include detection of a primaryantibody using a fluorescently labeled secondary antibody.

For example, the biological sample can be brought in contact with andimmobilized onto a solid phase support or carrier such asnitrocellulose, or other support which is capable of immobilizing cells,cell particles or soluble proteins. The support can then be washed withsuitable buffers followed by treatment with the detectably labeledfingerprint gene-specific antibody. The solid phase support can then bewashed with the buffer a second time to remove unbound antibody. Theamount of bound label on support can then be detected by conventionalmeans.

By “solid phase support or carrier” in the context of proteinaceousagents is intended any support capable of binding an antigen or anantibody. Well-known supports or carriers include glass, polystyrene,polypropylene, polyethylene, dextran, nylon, amylases, natural andmodified celluloses, polyacrylamides, gabbros, and magnetite. The natureof the carrier can be either soluble to some extent or insoluble for thepurposes of the present invention. The support material can havevirtually any possible structural configuration so long as the coupledmolecule is capable of binding to an antigen or antibody. Thus, thesupport configuration can be spherical, as in a bead, or cylindrical, asin the inside surface of a test tube, or the external surface of a rod.Alternatively, the surface can be flat such as a sheet, test strip, etc.Preferred supports include polystyrene beads. Those skilled in the artwill know many other suitable carriers for binding antibody or antigen,or will be able to ascertain the same by use of routine experimentation.

One of the ways in which a specific antibody can be detectably labeledis by linking the same to an enzyme and use in an enzyme immunoassay(EIA) (Voller, A., “The Enzyme Linked Immunosorbent Assay (ELISA)”,1978, Diagnostic Horizons 2:1-7, Microbiological Associates QuarterlyPublication, Walkersville, Md.); Voller, A. et al., 1978, J. Clin.Pathol. 31:507-520; Butler, J. E., 1981, Meth. Enzymol. 73:482-523;Maggio, E. (ed.), 1980, Enzyme Immunoassay, CRC Press, Boca Raton, Fla.;Ishikawa, E. et al., (eds.), 1981, Enzyme Immunoassay, Kgaku Shoin,Tokyo). The enzyme which is bound to the antibody will react with anappropriate substrate, in one embodiment a chromogenic substrate, insuch a manner as to produce a chemical moiety which can be detected, forexample, by spectrophotometric, fluorimetric or by visual means. Enzymeswhich can be used to detectably label the antibody include, but are notlimited to, malate dehydrogenase, staphylococcal nuclease,delta-5-steroid isomerase, yeast alcohol dehydrogenase,alpha-glycerophosphate, dehydrogenase, triose phosphate isomerase,horseradish peroxidase, alkaline phosphatase, asparaginase, glucoseoxidase, beta-galactosidase, ribonuclease, urease, catalase,glucose-6-phosphate dehydrogenase, glucoamylase andacetylcholinesterase. The detection can be accomplished by colorimetricmethods which employ a chromogenic substrate for the enzyme. Detectioncan also be accomplished by visual comparison of the extent of enzymaticreaction of a substrate in comparison with similarly prepared standards.

Detection can also be accomplished using any of a variety of otherimmunoassays. For example, by radioactively labeling the antibodies orantibody fragments, it is possible to detect a protein of interestthrough the use of a radioimmunoassay (RIA) (see, for example,Weintraub, B., Principles of Radioimmunoassays, Seventh Training Courseon Radioligand Assay Techniques, The Endocrine Society, March 1986,which is incorporated by reference herein). The radioactive isotope(e.g., ¹²⁵I, ¹³¹I, ³⁵S or ³H) can be detected by such means as the useof a gamma counter or a scintillation counter or by autoradiography.

It is also possible to label the antibody with a fluorescent compound.When the fluorescently labeled antibody is exposed to light of theproper wavelength, its presence can then be detected due tofluorescence. Among the most commonly used fluorescent labelingcompounds are fluorescein isothiocyanate, rhodamine, phycoerythrin,phycocyanin, allophycocyanin, o-phthaldehyde and fluorescamine.

The antibody can also be detectably labeled using fluorescence emittingmetals such as ¹⁵²Eu, or others of the lanthanide series. These metalscan be attached to the antibody using such metal chelating groups asdiethylenetriaminepentacetic acid (DTPA) or ethylenediaminetetraaceticacid (EDTA).

The antibody also can be detectably labeled by coupling it to achemiluminescent compound. The presence of the chemiluminescent-taggedantibody is then determined by detecting the presence of luminescencethat arises during the course of a chemical reaction. Examples ofparticularly useful chemiluminescent labeling compounds are luminol,isoluminol, theromatic acridinium ester, imidazole, acridinium salt andoxalate ester.

Likewise, a bioluminescent compound can be used to label the antibody ofthe present invention. Bioluminescence is a type of chemiluminescencefound in biological systems in, which a catalytic protein increases theefficiency of the chemiluminescent reaction. The presence of abioluminescent protein is determined by detecting the presence ofluminescence. Important bioluminescent compounds for purposes oflabeling are luciferin, luciferase and aequorin.

Protein Arrays

Polypeptides which specifically and/or selectively bind to the proteinproducts of the biomarkers of the invention can be immobilized on aprotein array. The protein array can be used as a tool, e.g., to testindividual samples (such as isolated cells, tissue, lymph, lymph tissue,blood, synovial fluid, sera, biopsies, and the like) for the presence ofthe polypeptides protein products of the biomarkers of the invention.The protein array can also include antibodies as well as other ligands,e.g., that bind to the polypeptides encoded by the biomarkers of theinvention.

Methods of producing polypeptide arrays are described, e.g., in De Wildtet al., 2000, Nature Biotech. 18:989-994; Lueking et al., 1999, Anal.Biochem. 270:103-111; Ge, 2000, Nuc. Acids Res. 28:e3; MacBeath andSchreiber, 2000, Science 289:1760-1763; International Publication Nos.WO 01/40803 and WO 99/51773A1; and U.S. Pat. No. 6,406,921. Polypeptidesfor the array can be spotted at high speed, e.g., using commerciallyavailable robotic apparatus, e.g., from Genetic MicroSystems andAffymetrix (Santa Clara, Calif., USA) or BioRobotics (Cambridge, UK).The array substrate can be, for example, nitrocellulose, plastic, glass,e.g., surface-modified glass. The array can also include a porousmatrix, e.g., acrylamide, agarose, or another polymer.

For example, the array can be an array of antibodies, e.g., as describedin De Wildt, supra. Cells that produce the polypeptide ligands can begrown on a filter in an arrayed format. Polypeptide production isinduced, and the expressed antibodies are immobilized to the filter atthe location of the cell. Information about the extent of binding ateach address of the array can be stored as a profile, e.g., in acomputer database.

In one embodiment the array is an array of protein products of the 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or all or anycombination of the biomarkers of the invention. In one aspect, theinvention provides for antibodies that are bound to an array whichselectively bind to the protein products of the biomarkers of theinvention.

(O) Protein Production

Standard recombinant nucleic acid methods can be used to express apolypeptide or antibody of the invention (e.g., a protein products of abiomarker of the invention). Generally, a nucleic acid sequence encodingthe polypeptide is cloned into a nucleic acid expression vector. Ofcourse, if the protein includes multiple polypeptide chains, each chainmust be cloned into an expression vector, e.g., the same or differentvectors, that are expressed in the same or different cells. If theprotein is sufficiently small, i.e., the protein is a peptide of lessthan 50 amino acids, the protein can be synthesized using automatedorganic synthetic methods. Polypeptides comprising the 5′ region, 3′region or internal coding region of a biomarker of the invention, areexpressed from nucleic acid expression vectors containing only thosenucleotide sequences corresponding to the 5′ region, 3′ region orinternal coding region of a biomarker of the invention. Methods forproducing antibodies directed to protein products of a biomarker of theinvention, or polypeptides encoded by the 5′ region, 3′ region orinternal coding regions of a biomarker of the invention.

The expression vector for expressing the polypeptide can include, inaddition to the segment encoding the polypeptide or fragment thereof,regulatory sequences, including for example, a promoter, operably linkedto the nucleic acid(s) of interest. Large numbers of suitable vectorsand promoters are known to those of skill in the art and arecommercially available for generating the recombinant constructs of thepresent invention. The following vectors are provided by way of example.Bacterial: pBs, phagescript, PsiX174, pBluescript SK, pBs KS, pNH8a,pNH16a, pNH18a, pNH46a (Stratagene, La Jolla, Calif., USA); pTrc99A,pKK223-3, pKK233-3, pDR540, and pRIT5 (Pharmacia, Uppsala, Sweden).Eukaryotic: pWLneo, pSV2cat, pOG44, PXTI, pSG (Stratagene) pSVK3, pBPV,pMSG, and pSVL (Pharmacia). One preferred class of preferred librariesis the display library, which is described below.

Methods well known to those skilled in the art can be used to constructvectors containing a polynucleotide of the invention and appropriatetranscriptional/translational control signals. These methods include invitro recombinant DNA techniques, synthetic techniques and in vivorecombination/genetic recombination. See, for example, the techniquesdescribed in Sambrook & Russell, Molecular Cloning: A Laboratory Manual,3^(rd) Edition, Cold Spring Harbor Laboratory, N.Y. (2001) and Ausubelet al., Current Protocols in Molecular Biology (Greene PublishingAssociates and Wiley Interscience, N.Y. (1989). Promoter regions can beselected from any desired gene using CAT (chloramphenicol transferase)vectors or other vectors with selectable markers. Two appropriatevectors are pKK232-8 and pCM7. Particular named bacterial promotersinclude lad, lacZ, T3, T7, gpt, lambda P, and trc. Eukaryotic promotersinclude CMV immediate early, HSV thymidine kinase, early and late SV40,LTRs from retrovirus, mouse metallothionein-I, and various art-knowntissue specific promoters. In specific embodiments, the promoter is aninducible promoter. In other embodiments, the promoter is a constitutivepromoter. In yet other embodiments, the promoter is a tissue-specificpromoter.

Generally, recombinant expression vectors will include origins ofreplication and selectable markers permitting transformation of the hostcell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiaeauxotrophic markers (such as URA3, LEU2, HIS3, and TRPl genes), and apromoter derived from a highly expressed gene to direct transcription ofa downstream structural sequence. Such promoters can be derived fromoperons encoding glycolytic enzymes such as 3-phosphoglycerate kinase(PGK), a-factor, acid phosphatase, or heat shock proteins, among others.The polynucleotide of the invention is assembled in appropriate phasewith translation initiation and termination sequences, and in someembodiments, a leader sequence capable of directing secretion oftranslated protein into the periplasmic space or extracellular medium.Optionally, a nucleic acid of the invention can encode a fusion proteinincluding an N-terminal identification peptide imparting desiredcharacteristics, e.g., stabilization or simplified purification ofexpressed recombinant product. Useful expression-vectors for bacteriaare constructed by inserting a polynucleotide of the invention togetherwith suitable translation initiation and termination signals, optionallyin operable reading phase with a functional promoter. The vector willcomprise one or more phenotypic selectable markers and an origin ofreplication to ensure maintenance of the vector and to, if desirable,provide amplification within the host. Suitable prokaryotic hosts fortransformation include E. coli, Bacillus subtilis, Salmonellatyphimurium and various species within the genera Pseudomonas,Streptomyces, and Staphylococcus, although others may also be employedas a matter of choice.

As a representative but nonlimiting example, useful expression vectorsfor bacteria can comprise a selectable marker and bacterial origin ofreplication derived from commercially available plasmids comprisinggenetic elements of the well known cloning vector pBR322 (ATCC 37017).Such commercial vectors include, for example, pKK223-3 (Pharmacia FineChemicals, Uppsala, Sweden) and pGEM1 (Promega, Madison, Wis., USA).

The present invention provides host cells genetically engineered tocontain the polynucleotides of the invention. For example, such hostcells may contain nucleic acids of the invention introduced into thehost cell using known transformation, transfection or infection methods.The present invention also provides host cells genetically engineered toexpress the polynucleotides of the invention, wherein suchpolynucleotides are in operative association with a regulatory sequenceheterologous to the host cell which drives expression of thepolynucleotides in the cell.

The present invention further provides host cells containing the vectorsof the present invention, wherein the nucleic acid has been introducedinto the host cell using known transformation, transfection or infectionmethods. The host cell can be a eukaryotic host cell, such as amammalian cell, a lower eukaryotic host cell, such as a yeast cell, orthe host cell can be a prokaryotic cell, such as a bacterial cell.Introduction of the recombinant construct into the host cell can beeffected, for example, by calcium phosphate transfection, DEAE, dextranmediated transfection, or electroporation (Davis, L. et al., BasicMethods in Molecular Biology (1986)). Cell-free translation systems canalso be employed to produce such proteins using RNAs derived from theDNA constructs of the present invention.

Any host/vector system can be used to express one or more of theproteins products of the biomarkers of the invention including thoselisted in Table 3 and/or Table 13. Appropriate cloning and expressionvectors for use with prokaryotic and eukaryotic hosts are described bySambrook et al., in Molecular Cloning: A Laboratory Manual, SecondEdition, Cold Spring Harbor, New York (1989), the disclosure of which isincorporated herein by reference in its entirety. The most preferredhost cells are those which do not normally express the particularpolypeptide or which expresses the polypeptide at low natural level.

In a specific embodiment, the host cells are engineered to express anendogenous gene comprising the polynucleotides of the invention underthe control of inducible regulatory elements, in which case theregulatory sequences of the endogenous gene may be replaced byhomologous recombination. As described herein, gene targeting can beused to replace a gene's existing regulatory region with a regulatorysequence isolated from a different gene or a novel regulatory sequencesynthesized by genetic engineering methods. Such regulatory sequencesmay be comprised of promoters, enhancers, scaffold-attachment regions,negative regulatory elements, transcriptional initiation sites,regulatory protein binding sites or combinations of said sequences.Alternatively, sequences which affect the structure or stability of theRNA or protein produced may be replaced, removed, added, or otherwisemodified by targeting, including polyadenylation signals, mRNA stabilityelements, splice sites, leader sequences for enhancing or modifyingtransport or secretion properties of the protein, or other sequenceswhich alter or improve the function or stability of protein or RNAmolecules.

The host of the present invention may also be a yeast or other fungi. Inyeast, a number of vectors containing constitutive or induciblepromoters may be used. For a review see, Ausubel et al. (eds), CurrentProtocols in Molecular Biology, Vol. 2, Greene Publish. Assoc. & WileyInterscience, Ch. 13 (1988); Grant et al., 1987, “Expression andSecretion Vectors for Yeast”, Methods Enzymol. 153:516-544; Glover, DNACloning, Vol. II, IRL Press, Wash., D.C., Ch. 3 (1986); Bitter, 1987,“Heterologous Gene Expression in Yeast”, Methods Enzymol. 152:673-684;and Strathern et al. (eds), The Molecular Biology of the YeastSaccharomyces, Cold Spring Harbor Press, Vols. I and II (1982).

Potentially suitable yeast strains include Saccharomyces cerevisiae,Schizosaccharomyces pombe, Kluyveromyces strains, Candida, or any yeaststrain capable of expressing heterologous proteins. Potentially suitablebacterial strains include Escherichia coli, enterobacteriaceae such asSerratia marescans, bacilli such as Bacillus subtilis, Salmonellatyphimurium, pseudomonads or any bacterial strain capable of expressingheterologous proteins. If the protein is made in yeast or bacteria, itmay be necessary to modify the protein produced therein, for example byphosphorylation or glycosylation of the appropriate sites, in order toobtain the functional protein. Such covalent attachments may beaccomplished using known chemical or enzymatic methods.

Various mammalian cell culture systems can also be employed to expressrecombinant protein. Examples of mammalian expression systems includethe monkey COS cells such as COS-7 lines of monkey kidney fibroblasts,described by Gluzman, 1981, Cell 23:175 (1981), Chinese Hamster Ovary(CHO) cells, human kidney 293 cells, human epidermal A431 cells, humanColo205 cells, 3T3 cells, CV-1 cells, normal diploid cells, cell strainsderived from in vitro culture of primary tissue, primary explants, HeLacells, mouse L cells, BHK, HL-60, U937, HaK, C127, 3T3, or Jurkat cells,and other cell lines capable of expressing a compatible vector.Mammalian expression vectors will comprise an origin of replication, asuitable promoter and also any necessary ribosome-binding sites,polyadenylation site, splice donor and acceptor sites, transcriptionaltermination sequences, and 5′ flanking nontranscribed sequences.

Microbial cells employed in expression of proteins can be disrupted byany convenient method, including freeze-thaw cycling, sonication,mechanical disruption, or use of cell lysing agents. Recombinantpolypeptides produced in bacterial culture are usually isolated byinitial extraction from cell pellets, followed by one or moresalting-out, aqueous ion exchange or size exclusion chromatographysteps. In some embodiments, the template nucleic acid also encodes apolypeptide tag, e.g., penta- or hexa-histidine.

Recombinant proteins can be isolated using an techniqe well-known in theart. Scopes (Protein Purification: Principles and Practice,Springer-Verlag, New York (1994)), for example, provides a number ofgeneral methods for purifying recombinant (and non-recombinant)proteins. The methods include, e.g., ion-exchange chromatography,size-exclusion chromatography, affinity chromatography, selectiveprecipitation, dialysis, and hydrophobic interaction chromatography.

(P) Methods for Identifying Compounds for Use in the Prevention,Treatment, or Amelioration of One or More Colorectal Pathologies

Methods to Identify Compounds that Modulate the Expression or Activityof a Biomarker

The present invention provides methods of identifying compounds thatbind to the products of the biomarkers of the invention. The presentinvention also provides methods for identifying compounds that modulatethe expression and/or activity of the products of the biomarkers of theinvention. The compounds identified via such methods are useful for thedevelopment of one or more animal models to study colorectal pathologyincluding polyps or one or more subtypes of polyps. Further, thecompounds identified via such methods are useful as lead compounds inthe development of prophylactic and therapeutic compositions forprevention, treatment, and/or amelioration of one or more colorectalpathologies including one or more polyps or one or more subtypes ofpolyps. Such methods are particularly useful in that the effort andgreat expense involved in testing potential prophylactics andtherapeutics in vivo is efficiently focused on those compoundsidentified via the in vitro and ex vivo methods described herein.

The present invention provides a method for identifying a compound to betested for an ability to prevent, treat, or ameliorate one or morecolorectal pathologies said method comprising: (a) contacting a cellexpressing a protein products of one or more biomarkers of the inventionor a fragment thereof, or RNA products of one or more biomarkers of theinvention or a fragment thereof with a test compound; and (b)determining the ability of the test compound to bind to the proteinproducts, protein fragment, RNA products, or RNA portion so that if acompound binds to the protein products, protein fragment, RNA products,RNA portions, a compound to be tested for an ability to prevent, treat,or ameliorate one or more colorectal pathologies is identified. Thecell, for example, can be a prokaryotic cell, yeast cell, viral cell ora cell of mammalian origin. Determining the ability of the test compoundto bind to the protein products, protein fragment, RNA products, or RNAportion can be accomplished, for example, by coupling the test compoundwith a radioisotope or enzymatic label such that binding of the testcompound to the protein products, protein fragment, RNA products, or RNAportion can be determined by detecting the labeled compound in acomplex. For example, test compounds can be labeled with ¹²⁵I, ³⁵S, ¹⁴C,or ³H, either directly or indirectly, and the radioisotope detected bydirect counting of radioemmission or by scintillation counting.Alternatively, test compounds can be enzymatically labeled with, forexample, horseradish peroxidase, alkaline phosphatase, or luciferase,and the enzymatic label detected by determination of conversion of anappropriate substrate to product. In a specific embodiment, the assaycomprises contacting a cell which expresses a protein products of one ormore biomarkers of the invention or a fragment thereof, or a RNAproducts of one or more biomarkers of the invention or a fragmentthereof, with a known compound which binds the protein products, proteinfragment, RNA products, or RNA portion to form an assay mixture,contacting the assay mixture with a test compound, and determining theability of the test compound to interact with the protein products,protein fragment, RNA products, or RNA portion, wherein determining theability of the test compound to interact with the protein products,protein fragment, RNA products, or RNA portion comprises determining theability of the test compound to preferentially bind to the proteinproducts, protein fragment, RNA products, or RNA portion as compared tothe known compound.

Binding of the test compound to the protein products or protein fragmentcan be determined either directly or indirectly. In a specificembodiment, the assay includes contacting a protein products of one ormore biomarkers of the invention or a fragment thereof, or one or moreRNA products of one or more biomarkers of the invention or a portionthereof with a known compound which binds the protein products, proteinfragment, RNA products, or RNA portion to form an assay mixture,contacting the assay mixture with a test compound, and determining theability of the test compound to interact with the protein products,protein fragment, RNA products, or RNA portion, wherein determining theability of the test compound to interact with the protein products,protein fragment, RNA products, or RNA portion comprises determining theability of the test compound to preferentially bind to the proteinproducts, protein fragment, RNA products, or RNA portion as compared tothe known compound. Techniques well known in the art can be used todetermine the binding between a test compound and a protein product of abiomarker of the invention or a fragment thereof, or a RNA products of abiomarker of the invention or a portion thereof.

In some embodiments of the above assay methods of the present invention,it may be desirable to immobilize a RNA products of a biomarker of theinvention or a portion thereof, or its target molecule to facilitateseparation of complexed from uncomplexed forms of the RNA products orRNA portion, the target molecule or both, as well as to accommodateautomation of the assay. In more than one embodiment of the above assaymethods of the present invention, it may be desirable to immobilizeeither a protein products of a biomarker of the invention or a fragmentthereof, or its target molecule to facilitate separation of complexedfrom uncomplexed forms of one or both of the proteins, as well as toaccommodate automation of the assay. Binding of a test compound to aprotein products of a biomarker of the invention or a fragment thereofcan be accomplished in any vessel suitable for containing the reactants.Examples of such vessels include microtiter plates, test tubes, andmicro-centrifuge tubes. In one embodiment, a fusion protein can beprovided which adds a domain that allows one or both of the proteins tobe bound to a matrix. For example, glutathione-S-transferase (GST)fusion proteins can be adsorbed onto glutathione sepharose beads (SigmaChemical; St. Louis, Mo.) or glutathione derivatized microtiter plates,which are then combined with the test compound or the test compound andeither the non-adsorbed target protein or a protein products of abiomarker of the invention or a fragment thereof, and the mixtureincubated under conditions conducive to complex formation (e.g., atphysiological conditions for salt and pH). Following incubation, thebeads or microtiter plate wells are washed to remove any unboundcomponents and complex formation is measured either directly orindirectly, for example, as described above. Alternatively, thecomplexes can be dissociated from the matrix, and the level of bindingof a protein products of a biomarker of the invention or a fragmentthereof can be determined using standard techniques.

Other techniques for immobilizing proteins on matrices can also be usedin the screening assays of the invention. For example, either a proteinproducts of a biomarker of the invention or a fragment thereof, or atarget molecule can be immobilized utilizing conjugation of biotin andstreptavidin. A biotinylated protein products of a biomarker of theinvention or a target molecule can be prepared from biotin-NHS(N-hydroxy-succinimide) using techniques well known in the art (e.g.,biotinylation kit, Pierce Chemicals; Rockford, Ill.), and immobilized inthe wells of streptavidin-coated 96 well plates (Pierce Chemical).Alternatively, antibodies reactive with a protein products of abiomarker of the invention or a fragment thereof can be derivatized tothe wells of the plate, and protein trapped in the wells by antibodyconjugation. Methods for detecting such complexes, in addition to thosedescribed above for the GST-immobilized complexes, includeimmunodetection of complexes using antibodies reactive with a proteinproducts of a biomarker of the invention, as well as enzyme-linkedassays which rely on detecting an enzymatic activity associated with aprotein products of a biomarker of the invention or a fragment thereof,or target molecule.

The interaction or binding of a protein products of a biomarker of theinvention or a fragment thereof to a test compound can also bedetermined using such proteins or protein fragments as “bait proteins”in a two-hybrid assay or three hybrid assay (see, e.g., U.S. Pat. No.5,283,317; Zervos et al. (1993) Cell 72:223-232; Madura et al. (1993) J.Biol. Chem. 268:12046-12054; Bartel et al. (1993) Bio/Techniques14:920-924; Iwabuchi et al. (1993) Oncogene 8:1693-1696; andInternational Publication No. WO 94/10300).

The present invention provides a method for identifying a compound to betested for an ability to prevent, treat, or ameliorate one or morecolorectal pathologies, said method comprising: (a) contacting a cellexpressing a protein or RNA products of one or more biomarkers of theinvention with a test compound; (b) determining the amount of theprotein or RNA products present in (a); and (c) comparing the amount in(a) to that present in a corresponding control cell that has not beencontacted with the test compound, so that if the amount of the proteinor RNA products is altered relative to the amount in the control, acompound to be tested for an ability to prevent, treat, or ameliorateone or more colorectal pathologies is identified. In a specificembodiment, the expression level(s) is altered by 5%, 10%, 15%, 25%,30%, 40%, 50%, 5 to 25%, 10 to 30%, at least 1 fold, at least 1.5 fold,at least 2 fold, 4 fold, 5 fold, 10 fold, 25 fold, 1 to 10 fold, or 5 to25 fold relative to the expression level in the control as determined byutilizing an assay described herein (e.g., a microarray or QRT-PCR) oran assay well known to one of skill in the art. In alternateembodiments, such a method comprises determining the amount of theprotein or RNA products of at least 2, at least 3, at least 4, at least5, at least 6, at least 7, at least 8, 1 to 3, 1 to 5, 1-8, all or anycombination of the biomarkers of the invention present in the cell andcomparing the amounts to those present in the control.

The cells utilized in the cell-based assays described herein can beengineered to express a biomarker of the invention utilizing techniquesknown in the art. See, e.g., Section III entitled “RecombinantExpression Vectors and Host Cells” of U.S. Pat. No. 6,245,527, which isincorporated herein by reference. Alternatively, cells that endogenouslyexpress a biomarker of the invention can be used. For example,intestinal cells may be used.

In a specific embodiment, intestinal cells are isolated from a “normal”individual, or an individual with one or more colorectal pathologies andare incubated in the presence and absence of a test compound for varyingamounts of time (i.e., 30 min, 1 hr, 5 hr, 24 hr, 48 hr, and 96 hr).When screening for prophylactic or therapeutic agents, a clone of thefull sequence of a biomarker of the invention or functional portionthereof is used to transfect the cells. The transfected cells arecultured for varying amounts of time (i.e., 1, 2, 3, 5, 7, 10, or 14days) in the presence or absence of test compound. Following incubation,target nucleic acid samples are prepared from the cells and hybridizedto a nucleic acid probe corresponding to a nucleic acid sequence whichare differentially expressed in individuals with one or more colorectalpathologies. The nucleic acid probe is labeled, for example, with aradioactive label, according to methods well-known in the art anddescribed herein. Hybridization is carried out by northern blot, forexample as described in Ausubel et al., supra or Sambrook et al.,supra). The differential hybridization, as defined herein, of the targetto the samples on the array from normal relative to RNA from sampleshaving one or more colorectal pathologies is indicative of the level ofexpression of RNA corresponding to a differentially expressed specificnucleic acid sequence. A change in the level of expression of the targetsequence as a result of the incubation step in the presence of the testcompound, is indicative of a compound that increases or decreases theexpression of the corresponding polyp biomarker specific nucleic acidsequence.

The present invention also provides a method for identifying a compoundto be tested for an ability to prevent, treat, or ameliorate one or morecolorectal pathologies, said method comprises: (a) contacting acell-free extract (e.g., an intestinal cell extract) with a nucleic acidsequence encoding a protein or RNA products of one or more biomarkers ofthe invention and a test compound; (b) determining the amount of theprotein or RNA product present in (a); and (c) comparing the amount(s)in (a) to that present to a corresponding control that has not beencontacted with the test compound, so that if the amount of the proteinor RNA product is altered relative to the amount in the control, acompound to be tested for an ability to prevent, treat, or ameliorateone or more colorectal pathologies is identified. In a specificembodiment, the expression level(s) is altered by 5%, 10%, 15%, 25%,30%, 40%, 50%, 5 to 25%, 10 to 30%, at least 1 fold, at least 1.5 fold,at least 2 fold, 4 fold, 5 fold, 10 fold, 25 fold, 1 to 10 fold, or 5 to25 fold relative to the expression level in the control sampledetermined by utilizing an assay described herein (e.g., a microarray orQRT-PCR) or an assay well known to one of skill in the art. In alternateembodiments, such a method comprises determining the amount of a proteinor RNA product of at least 2, at least 3, at least 4, at least 5, atleast 6, at least 7, at least 8, 1 to 3, 1 to 5, 1-8, all or anycombination of the biomarkers of the invention present in the extractand comparing the amounts to those present in the control.

In certain embodiments, the amount of RNA product of a biomarker of theinvention is determined, in other embodiments, the amount of proteinproducts of a biomarker of the invention is determined, while in stillother embodiments, the amount of RNA and protein products of a biomarkerof the invention is determined. Standard methods and compositions fordetermining the amount of RNA or protein products of a biomarker of theinvention can be utilized. Such methods and compositions are describedin detail above.

Kits to Identify Compounds that Modulate the Expression or Activity of aBiomarker

In specific embodiments, in a screening assay described herein, theamount of protein or RNA product of a biomarker of the invention isdetermined utilizing kits. Such kits comprise materials and reagentsrequired for measuring the expression of at least 1, at least 2, atleast 3, at least 4, at least 5, at least 6, at least 7, at least 8protein or RNA products of at least 1, at least 2, at least 3, at least4, at least 5, at least 6, at least 7, at least 8, all or anycombination of the biomarkers of the invention. In specific embodiments,the kits may further comprise one or more additional reagents employedin the various methods, such as: (1) reagents for purifying RNA fromblood; (2) primers for generating test nucleic acids; (3) dNTPs and/orrNTPs (either premixed or separate), optionally with one or moreuniquely labeled dNTPs and/or rNTPs (e.g., biotinylated or Cy3 or Cy5tagged dNTPs); (4) post synthesis labeling reagents, such as chemicallyactive derivatives of fluorescent dyes; (5) enzymes, such as reversetranscriptases, DNA polymerases, and the like; (6) various buffermediums, e.g., hybridization and washing buffers; (7) labeled probepurification reagents and components, like spin columns, etc.; and (8)protein purification reagents; (9) signal generation and detectionreagents, e.g., streptavidin-alkaline phosphatase conjugate,chemifluorescent or chemiluminescent substrate, and the like. Inparticular embodiments, the kits comprise prelabeled quality controlledprotein and or RNA transcript (in some embodiments, mRNA) for use as acontrol.

In some embodiments, the kits are QRT-PCR kits. In other embodiments,the kits are nucleic acid arrays and protein arrays. Such kits accordingto the subject invention will at least comprise an array havingassociated protein or nucleic acid members of the invention andpackaging means therefore. Alternatively the protein or nucleic acidmembers of the invention may be prepackaged onto an array.

In a specific embodiment, kits for measuring a RNA product of abiomarker of the invention comprise materials and reagents that arenecessary for measuring the expression of the RNA product. For example,a microarray or QRT-PCR kit may be used and contain only those reagentsand materials necessary for measuring the levels of RNA products of atleast 1, at least 2, at least 3, at least 4, at least 5, at least 6, atleast 7, at least 8, all or any combination of the biomarkers of theinvention. Alternatively, in some embodiments, the kits can comprisematerials and reagents that are not limited to those required to measurethe levels of RNA products of 1, 2, 3, 4, 5, 6, 7, 8, all or anycombination of the biomarkers of the invention. For example, amicroarray kit may contain reagents and materials necessary formeasuring the levels of RNA products 1, 2, 3, 4, 5, 6, 7, 8, all or anycombination of the biomarkers of the invention, in addition to reagentsand materials necessary for measuring the levels of the RNA products ofat least 1, at least 2, at least 3, at least 4, at least 5, at least 6,at least 7, at least 8, or more genes other than the biomarkers of theinvention. In a specific embodiment, a microarray or QRT-PCR kitcontains reagents and materials necessary for measuring the levels ofRNA products of at least 1, at least 2, at least 3, at least 4, at least5, at least 6, at least 7, at least 8, all or any combination of thebiomarkers of the invention, and 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35,40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200,225, 250, 300, 350, 400, 450, or more genes that are not biomarkers ofthe invention, or 1-10, 1-100, 1-150, 1-200, 1-300, 1-400, 1-500,1-1000, 25-100, 25-200, 25-300, 25-400, 25-500, 25-1000, 100-150,100-200, 100-300, 100-400, 100-500, 100-1000 or 500-1000 or more genesthat are not biomarkers of the invention.

For nucleic acid microarray kits, the kits generally comprise probesattached or localized to a support surface. The probes may be labeledwith a detectable label. In a specific embodiment, the probes arespecific for the 5′ region, the 3′ region, the internal coding region,an exon(s), an intron(s), an exon junction(s), or an exon-intronjunction(s), of 1, 2, 3, 4, 5, 6, 7, 8, all or any combination of thebiomarkers of the invention. The microarray kits may compriseinstructions for performing the assay and methods for interpreting andanalyzing the data resulting from the performance of the assay. The kitsmay also comprise hybridization reagents and/or reagents necessary fordetecting a signal produced when a probe hybridizes to a target nucleicacid sequence. Generally, the materials and reagents for the microarraykits are in one or more containers. Each component of the kit isgenerally in its own a suitable container.

For QRT-PCR kits, the kits generally comprise pre-selected primersspecific for particular RNA products (e.g., an exon(s), an intron(s), anexon junction(s), and an exon-intron junction(s)) of 1, 2, 3, 4, 5, 6,7, 8, all or any combination of the biomarkers of the invention. TheQRT-PCR kits may also comprise enzymes suitable for reverse transcribingand/or amplifying nucleic acids (e.g., polymerases such as Taq, enzymessuch as reverse transcriptase etc.), and deoxynucleotides and buffersneeded for the reaction mixture for reverse transcription andamplification. The QRT-PCR kits may also comprise biomarker specificsets of primers specific for 1, 2, 3, 4, 5, 6, 7, 8, all or anycombination of the biomarkers of the invention. The QRT-PCR kits mayalso comprise biomarker specific probes which are specific for thesequences amplified from 1, 2, 3, 4, 5, 6, 7, 8, all or any combinationof the biomarkers of the invention using the biomarker specific sets ofprimers. The probes may or may not be labeled with a detectable label(e.g., a fluorescent label). In some embodiments, when contemplatingmultiplexing it is helpful if the probes are labeled with a differentdetectable label (e.g., FAM or HEX). Each component of the QRT-PCR kitis generally in its own suitable container. Thus, these kits generallycomprise distinct containers suitable for each individual reagent,enzyme, primer and probe. Further, the QRT-PCR kits may compriseinstructions for performing the assay and methods for interpreting andanalyzing the data resulting from the performance of the assay.

For antibody based kits, the kit can comprise, for example: (1) a firstantibody (which may or may not be attached to a support) which binds toprotein of interest (e.g., a protein products of 1, 2, 3, 4, 5, 6, 7, 8,all or any combination of the biomarkers of the invention); and,optionally, (2) a second, different antibody which binds to either theprotein, or the first antibody and is conjugated to a detectable label(e.g., a fluorescent label, radioactive isotope or enzyme). Theantibody-based kits may also comprise beads for conducting animmunoprecipitation. Each component of the antibody-based kits isgenerally in its own suitable container. Thus, these kits generallycomprise distinct containers suitable for each antibody. Further, theantibody-based kits may comprise instructions for performing the assayand methods for interpreting and analyzing the data resulting from theperformance of the assay.

Reporter gene-based assays may also be conducted to identify a compoundto be tested for an ability to prevent, treat, or ameliorate one or morecolorectal pathologies. In a specific embodiment, the present inventionprovides a method for identifying a compound to be tested for an abilityto prevent, treat, or ameliorate one or more colorectal pathologies,said method comprising: (a) contacting a compound with a cell expressinga reporter gene construct comprising a reporter gene operably linked toa regulatory element of a biomarker of the invention (e.g., apromoter/enhancer element); (b) measuring the expression of saidreporter gene; and (c) comparing the amount in (a) to that present in acorresponding control cell that has not been contacted with the testcompound, so that if the amount of expressed reporter gene is alteredrelative to the amount in the control cell, a compound to be tested foran ability to prevent, treat, or ameliorate one or more colorectalpathologies is identified. In accordance with this embodiment, the cellmay naturally express the biomarker or be engineered to express thebiomarker. In another embodiment, the present invention provides amethod for identifying a compound to be tested for an ability toprevent, treat, or ameliorate one or more colorectal pathologies, saidmethod comprising: (a) contacting a compound with a cell-free extractand a reporter gene construct comprising a reporter gene operably linkedto a regulatory element of a biomarker of the invention (e.g., apromoter/enhancer element); (b) measuring the expression of saidreporter gene; and (c) comparing the amount in (a) to that present in acorresponding control that has not been contacted with the testcompound, so that if the amount of expressed reporter gene is alteredrelative to the amount in the control, a compound to be tested for anability to prevent, treat, or ameliorate one or more colorectalpathologies is identified.

Any reporter gene well-known to one of skill in the art may be used inreporter gene constructs used in accordance with the methods of theinvention. Reporter genes refer to a nucleotide sequence encoding a RNAtranscript or protein that is readily detectable either by its presence(by, e.g., RT-PCR, Northern blot, Western Blot, ELISA, etc.) oractivity. Non-limiting examples of reporter genes are listed in Table10. Reporter genes may be obtained and the nucleotide sequence of theelements determined by any method well-known to one of skill in the art.The nucleotide sequence of a reporter gene can be obtained, e.g., fromthe literature or a database such as GenBank. Alternatively, apolynucleotide encoding a reporter gene may be generated from nucleicacid from a suitable source. If a clone containing a nucleic acidencoding a particular reporter gene is not available, but the sequenceof the reporter gene is known, a nucleic acid encoding the reporter genemay be chemically synthesized or obtained from a suitable source (e.g.,a cDNA library, or a cDNA library generated from, or nucleic acid,preferably poly A+ RNA, isolated from, any tissue or cells expressingthe reporter gene) by PCR amplification. Once the nucleotide sequence ofa reporter gene is determined, the nucleotide sequence of the reportergene may be manipulated using methods well-known in the art for themanipulation of nucleotide sequences, e.g., recombinant DNA techniques,site directed mutagenesis, PCR, etc. (see, for example, the techniquesdescribed in Sambrook et al., 1990, Molecular Cloning, A LaboratoryManual, 2d Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.and Ausubel et al., eds., 1998, Current Protocols in Molecular Biology,John Wiley & Sons, NY, which are both incorporated by reference hereinin their entireties), to generate reporter genes having a differentamino acid sequence, for example to create amino acid substitutions,deletions, and/or insertions.

In accordance with the invention, cells that naturally or normallyexpress one or more, all or any combination of the biomarkers of theinvention can be used in the methods described herein. Alternatively,cells can be engineered to express one or more, all or any combinationof the biomarkers of the invention, or a reporter gene using techniqueswell-known in the art and used in the methods described herein. Examplesof such techniques include, but are not to, calcium phosphateprecipitation (see, e.g., Graham & Van der Eb, 1978, Virol. 52:546),dextran-mediated transfection, calcium phosphate mediated transfection,polybrene mediated transfection, protoplast fusion, electroporation,encapsulation of the nucleic acid in liposomes, and directmicroinjection of the nucleic acid into nuclei.

In a specific embodiment, the cells used in the methods described hereinare intestinal cells or cell lines, lymphocytes (T or B lymphocytes),monocytes, neutrophils, macrophages, eosinophils, basophils,erythrocytes or platelets. In a preferred embodiment, the cells used inthe methods described herein are intestinal cells. In another preferredembodiment, the cells used in the methods described herein arelymphocytes. In another embodiment, the cells used in the methodsdescribed herein are immortalized cell lines derived from a source,e.g., a tissue.

Any cell-free extract that permits the translation, and optionally butpreferably, the transcription, of a nucleic acid can be used inaccordance with the methods described herein. The cell-free extract maybe isolated from cells of any species origin. For example, the cell-freetranslation extract may be isolated from human cells, cultured mousecells, cultured rat cells, Chinese hamster ovary (CHO) cells, Xenopusoocytes, rabbit reticulocytes, wheat germ, or rye embryo (see, e.g.,Krieg & Melton, 1984, Nature 308:203 and Dignam et al., 1990 MethodsEnzymol. 182:194-203). Alternatively, the cell-free translation extract,e.g., rabbit reticulocyte lysates and wheat germ extract, can bepurchased from, e.g., Promega, (Madison, Wis.). In a preferredembodiment, the cell-free extract is an extract isolated from humancells. In a specific embodiment, the human cells are HeLa cells,lymphocytes, or intestinal cells or cell lines.

In addition to the ability to modulate the expression levels of RNAand/or protein products a biomarker of the invention, it may bedesirable, at least in certain instances, that compounds modulate theactivity of a protein products of a biomarker of the invention. Thus,the present invention provides methods of identifying compounds to betested for an ability to prevent, treat, or ameliorate one or morecolorectal pathologies, comprising methods for identifying compoundsthat modulate the activity of a protein products of one or morebiomarkers of the invention. Such methods can comprise: (a) contacting acell expressing a protein products of one or more biomarkers of theinvention with a test compound; (b) determining the activity level ofthe protein products; and (c) comparing the activity level to that in acorresponding control cell that has not been contacted with the testcompound, so that if the level of activity in (a) is altered relative tothe level of activity in the control cell, a compound to be tested foran ability to prevent, treat, or ameliorate one or more colorectalpathologies is identified. In a specific embodiment, the activitylevel(s) is altered by 5%, 10%, 15%, 25%, 30%, 40%, 50%, 5 to 25%, 10 to30%, at least 1 fold, at least 1.5 fold, at least 2 fold, 4 fold, 5fold, 10 fold, 25 fold, 1 to 10 fold, or 5 to 25 fold relative to theactivity level in the control as determined by utilizing an assaydescribed herein (e.g., a microarray or QRT-PCR) or an assay well knownto one of skill in the art. In alternate embodiments, such a methodcomprises determining the activity level of a protein products of atleast 2, at least 3, at least 4, at least 5, at least 6, at least 7, atleast 8, at least 9, at least 10, at least 12, at least 15, 1 to 5,1-10, 5-10, 5-15, or 10-15, or more, or all or any combination of thebiomarkers of the invention present in the cell and comparing theactivity levels to those present in the control.

The present invention provides methods of identifying compounds to betested for an ability to prevent, treat, or ameliorate one or morecolorectal pathologies, comprising: (a) contacting a cell-free extractwith a nucleic acid encoding a protein products of one or morebiomarkers of the invention and a test compound; (b) determining theactivity level of the protein products; and (c) comparing the activitylevel to that in a corresponding control that has not been contactedwith the test compound, so that if the level of activity in (a) isaltered relative to the level of activity in the control, a compound tobe tested for an ability to prevent, treat, or ameliorate one or morecolorectal pathologies is identified. In a specific embodiment, theactivity level(s) is altered by 5%, 10%, 15%, 25%, 30%, 40%, 50%, 5 to25%, 10 to 30%, at least 1 fold, at least 1.5 fold, at least 2 fold, 4fold, 5 fold, 10 fold, 25 fold, 1 to 10 fold, or 5 to 25 fold relativeto the activity level in the control as determined by utilizing an assaydescribed herein (e.g., a microarray or QRT-PCR) or an assay well knownto one of skill in the art. In alternate embodiments, such a methodcomprises determining the activity level of a protein products of atleast 2, at least 3, at least 4, at least 5, at least 6, at least 7, atleast 8, 1 to 3, 1 to 5, 1-8 all or any combination of the biomarkers ofthe invention present in the sample and comparing the activity levels tothose present in the control.

Standard techniques can be utilized to determine the level of activityof a protein product of a biomarker of the invention. Activities ofprotein products of biomarkers of the invention that can be determinedusing techniques well known in the art.

Method to Utilize the Biological Activity of the Compounds

Upon identification of compounds to be tested for an ability to prevent,treat, or ameliorate one or more colorectal pathologies (for conveniencereferred to herein as a “lead” compound), the compounds can be furtherinvestigated. For example, the compounds identified via the presentmethods can be further tested in vivo in accepted animal models of polypformation. Further, the compounds identified via the methods can beanalyzed with respect to their specificity. Techniques for suchadditional compound investigation are well known to one of skill in theart.

In one embodiment, the effect of a lead compound can be assayed bymeasuring the cell growth or viability of the target cell. Such assayscan be carried out with representative cells of cell types involved inpolyp formation (e.g., intestinal cells; cells isolated from differentportions of the gastrointestinal system and the like). Alternatively,instead of culturing cells from a patient, a lead compound may bescreened using cells of a cell line.

Many assays well-known in the art can be used to assess the survivaland/or growth of a patient cell or cell line following exposure to alead compound; for example, cell proliferation can be assayed bymeasuring Bromodeoxyuridine (BrdU) incorporation (see, e.g., Hoshino etal., 1986, Int. J. Cancer 38, 369; Campana et al., 1988, J. Immunol.Meth. 107:79) or (³H)-thymidine incorporation (see, e.g., Chen, J.,1996, Oncogene 13:1395-403; Jeoung, J., 1995, J. Biol. Chem.270:18367-73), by direct cell count, by detecting changes intranscription, translation or activity of known genes such asproto-oncogenes (e.g., fos, myc) or cell cycle markers (Rb, cdc2, cyclinA, D1, D2, D3, E, etc). The levels of such protein and RNA (e.g., mRNA)and activity can be determined by any method well known in the art. Forexample, protein can be quantitated by known immunological based methodssuch as Western blotting or immunoprecipitation using commerciallyavailable antibodies. mRNA can be quantitated using methods that arewell known and routine in the art, for example, using northern analysis,RNase protection, the polymerase chain reaction in connection with thereverse transcription. Cell viability can be assessed by usingtrypan-blue staining or other cell death or viability markers known inthe art. In a specific embodiment, the level of cellular ATP is measuredto determined cell viability. Differentiation can be assessed, forexample, visually based on changes in morphology.

Animal Models

Compounds can be tested in suitable animal model systems prior to use inhumans. Such animal model systems include but are not limited to rats,mice, chicken, cows, monkeys, pigs, dogs, rabbits, etc. Any animalsystem well-known in the art may be used. In certain embodiments,compounds are tested in a mouse model. Compounds can be administeredrepeatedly.

Accepted animal models can be utilized to determine the efficacy of thecompounds identified via the methods described above for the prevention,treatment, and/or amelioration of one or more colorectal pathologies.

Toxicity

The toxicity and/or efficacy of a compound identified in accordance withthe invention can be determined by standard pharmaceutical procedures incell cultures or experimental animals, e.g., for determining the LD₅₀(the dose lethal to 50% of the population) and the ED₅₀ (the dosetherapeutically effective in 50% of the population). Cells and celllines that can be used to assess the cytotoxicity of a compoundidentified in accordance with the invention include, but are not limitedto, peripheral blood mononuclear cells (PBMCs), Caco-2 cells, and Huh7cells. The dose ratio between toxic and therapeutic effects is thetherapeutic index and it can be expressed as the ratio LD₅₀/ED₅₀. Acompound identified in accordance with the invention that exhibits largetherapeutic indices is preferred. While a compound identified inaccordance with the invention that exhibits toxic side effects may beused, care should be taken to design a delivery system that targets suchagents to the site of affected tissue in order to minimize potentialdamage to uninfected cells and, thereby, reduce side effects.

The data obtained from the cell culture assays and animal studies can beused in formulating a range of dosage of a compound identified inaccordance with the invention for use in humans. The dosage of suchagents lies preferably within a range of circulating concentrations thatinclude the ED₅₀ with little or no toxicity. The dosage may vary withinthis range depending upon the dosage form employed and the route ofadministration utilized. For any agent used in the method of theinvention, the therapeutically effective dose can be estimated initiallyfrom cell culture assays. A dose may be formulated in animal models toachieve a circulating plasma concentration range that includes the IC₅₀(i.e., the concentration of the compound that achieves a half-maximalinhibition of symptoms) as determined in cell culture. Such informationcan be used to more accurately determine useful doses in humans. Levelsin plasma may be measured, for example, by high performance liquidchromatography.

Design of Congeners or Analogs

The compounds which display the desired biological activity can be usedas lead compounds for the development or design of congeners or analogshaving useful pharmacological activity. For example, once a leadcompound is identified, molecular modeling techniques can be used todesign variants of the compound that can be more effective. Examples ofmolecular modeling systems are the CHARM and QUANTA programs (PolygenCorporation, Waltham, Mass.). CHARM performs the energy minimization andmolecular dynamics functions. QUANTA performs the construction, graphicmodelling and analysis of molecular structure. QUANTA allows interactiveconstruction, modification, visualization, and analysis of the behaviorof molecules with each other.

A number of articles review computer modeling of drugs interactive withspecific proteins, such as Rotivinen et al., 1988, Acta PharmaceuticalFennica 97:159-166; Ripka, 1998, New Scientist 54-57; McKinaly &Rossmann, 1989, Annu Rev. Pharmacol. Toxiciol. 29:111-122; Perry &Davies, OSAR: Quantitative Structure-Activity Relationships in DrugDesign pp. 189-193 (Alan R. Liss, Inc. 1989); Lewis & Dean, 1989, Proc.R. Soc. Lond. 236:125-140 and 141-162; Askew et al., 1989, J. Am. Chem.Soc. 111:1082-1090. Other computer programs that screen and graphicallydepict chemicals are available from companies such as BioDesign, Inc.(Pasadena, Calif.), Allelix, Inc. (Mississauga, Ontario, Canada), andHypercube, Inc. (Cambridge, Ontario). Although these are primarilydesigned for application to drugs specific to particular proteins, theycan be adapted to design of drugs specific to any identified region. Theanalogs and congeners can be tested for binding to the proteins ofinterest (i.e., the protein products of a biomarker of the invention)using the above-described screens for biologic activity. Alternatively,lead compounds with little or no biologic activity, as ascertained inthe screen, can also be used to design analogs and congeners of thecompound that have biologic activity.

Compounds

Compounds that can be tested and identified methods described herein caninclude, but are not limited to, compounds obtained from any commercialsource, including Aldrich (1001 West St. Paul Ave., Milwaukee, Wis.53233), Sigma Chemical (P.O. Box 14508, St. Louis, Mo. 63178), FlukaChemie AG (Industriestrasse 25, CH-9471 Buchs, Switzerland (FlukaChemical Corp. 980 South 2nd Street, Ronkonkoma, N.Y. 11779)), EastmanChemical Company, Fine Chemicals (P.O Box 431, Kingsport, Tenn. 37662),Boehringer Mannheim GmbH (Sandhofer Strasse 116, D-68298 Mannheim),Takasago (4 Volvo Drive, Rockleigh, N.J. 07647), SST Corporation (635Brighton Road, Clifton, N.J. 07012), Ferro (111 West Irene Road,Zachary, La. 70791), Riedel-deHaen Aktiengesellschaft (P.O. Box D-30918,Seelze, Germany), PPG Industries Inc., Fine Chemicals (One PPG Place,34th Floor, Pittsburgh, Pa. 15272). Further any kind of natural productsmay be screened using the methods of the invention, including microbial,fungal, plant or animal extracts.

Compounds from large libraries of synthetic or natural compounds can bescreened. Numerous means are currently used for random and directedsynthesis of saccharide, peptide, and nucleic acid-based compounds.Synthetic compound libraries are commercially available from a number ofcompanies including Maybridge Chemical Co. (Trevillet, Cornwall, UK),Comgenex (Princeton, N.J.), Brandon Associates (Merrimack, N.H.), andMicrosource (New Milford, Conn.). A rare chemical library is availablefrom Aldrich (Milwaukee, Wis.). Combinatorial libraries are availableand are prepared. Alternatively, libraries of natural compounds in theform of bacterial, fungal, plant and animal extracts are available frome.g., Pan Laboratories (Bothell, Wash.) or MycoSearch (NC), or arereadily produceable by methods well known in the art. Additionally,natural and synthetically produced libraries and compounds are readilymodified through conventional chemical, physical, and biochemical means.

Furthermore, diversity libraries of test compounds, including smallmolecule test compounds, may be utilized. Libraries screened using themethods of the present invention can comprise a variety of types ofcompounds. Examples of libraries that can be screened in accordance withthe methods of the invention include, but are not limited to, peptoids;random biooligomers; diversomers such as hydantoins, benzodiazepines anddipeptides; vinylogous polypeptides; nonpeptidal peptidomimetics;oligocarbamates; peptidyl phosphonates; peptide nucleic acid libraries;antibody libraries; carbohydrate libraries; and small molecule libraries(in some embodiments, small organic molecule libraries). In someembodiments, the compounds in the libraries screened are nucleic acid orpeptide molecules. In a non-limiting example, peptide molecules canexist in a phage display library. In other embodiments, the types ofcompounds include, but are not limited to, peptide analogs includingpeptides comprising non-naturally occurring amino acids, e.g., D-aminoacids, phosphorous analogs of amino acids, such as α-amino phosphoricacids and α-amino phosphoric acids, or amino acids having non-peptidelinkages, nucleic acid analogs such as phosphorothioates and PNAs,hormones, antigens, synthetic or naturally occurring drugs, opiates,dopamine, serotonin, catecholamines, thrombin, acetylcholine,prostaglandins, organic molecules, pheromones, adenosine, sucrose,glucose, lactose and galactose. Libraries of polypeptides or proteinscan also be used in the assays of the invention.

In a specific embodiment, the combinatorial libraries are small organicmolecule libraries including, but not limited to, benzodiazepines,isoprenoids, thiazolidinones, metathiazanones, pyrrolidines, morpholinocompounds, and benzodiazepines. In another embodiment, the combinatoriallibraries comprise peptoids; random bio-oligomers; benzodiazepines;diversomers such as hydantoins, benzodiazepines and dipeptides;vinylogous polypeptides; nonpeptidal peptidomimetics; oligocarbamates;peptidyl phosphonates; peptide nucleic acid libraries; antibodylibraries; or carbohydrate libraries. Combinatorial libraries arethemselves commercially available For example, libraries may becommercially obtained from, e.g., Specs and BioSpecs B.V. (Rijswijk, TheNetherlands), Chembridge Corporation (San Diego, Calif.), ContractService Company (Dolgoprudny, Moscow Region, Russia), Comgenex USA Inc.(Princeton, N.J.), Maybridge Chemicals Ltd. (Cornwall PL34 OHW, UnitedKingdom), Asinex (Moscow, Russia), ComGenex (Princeton, N.J.), Ru,Tripos, Inc. (St. Louis, Mo.), ChemStar, Ltd (Moscow, Russia), 3DPharmaceuticals (Exton, Pa.), and Martek Biosciences (Columbia, Md.).

In a preferred embodiment, the library is preselected so that thecompounds of the library are more amenable for cellular uptake. Forexample, compounds are selected based on specific parameters such as,but not limited to, size, lipophilicity, hydrophilicity, and hydrogenbonding, which enhance the likelihood of compounds getting into thecells. In another embodiment, the compounds are analyzed bythree-dimensional or four-dimensional computer computation programs.

The combinatorial compound library for use in accordance with themethods of the present invention may be synthesized. There is a greatinterest in synthetic methods directed toward the creation of largecollections of small organic compounds, or libraries, which could bescreened for pharmacological, biological or other activity. Thesynthetic methods applied to create vast combinatorial libraries areperformed in solution or in the solid phase, i.e., on a support.Solid-phase synthesis makes it easier to conduct multi-step reactionsand to drive reactions to completion with high yields because excessreagents can be easily added and washed away after each reaction step.Solid-phase combinatorial synthesis also tends to improve isolation,purification and screening. However, the more traditional solution phasechemistry supports a wider variety of organic reactions than solid-phasechemistry.

Combinatorial compound libraries of the present invention may besynthesized using the apparatus described in U.S. Pat. No. 6,190,619 toKilcoin et al., which is hereby incorporated by reference in itsentirety. U.S. Pat. No. 6,190,619 discloses a synthesis apparatuscapable of holding a plurality of reaction vessels for parallelsynthesis of multiple discrete compounds or for combinatorial librariesof compounds.

In one embodiment, the combinatorial compound library can be synthesizedin solution. The method disclosed in U.S. Pat. No. 6,194,612 to Boger etal., which is hereby incorporated by reference in its entirety, featurescompounds useful as templates for solution phase synthesis ofcombinatorial libraries. The template is designed to permit reactionproducts to be easily purified from unreacted reactants usingliquid/liquid or solid/liquid extractions. The compounds produced bycombinatorial synthesis using the template will in some embodiments besmall organic molecules. Some compounds in the library may mimic theeffects of non-peptides or peptides. In contrast to solid phasesynthesize of combinatorial compound libraries, liquid phase synthesisdoes not require the use of specialized protocols for monitoring theindividual steps of a multistep solid phase synthesis (Egner et al.,1995, J. Org. Chem. 60:2652; Anderson et al., 1995, J. Org. Chem.60:2650; Fitch et al., 1994, J. Org. Chem. 59:7955; Look et al., 1994,J. Org. Chem. 49:7588; Metzger et al., 1993, Angew. Chem., Int. Ed.Engl. 32:894; Youngquist et al., 1994, Rapid Commun. Mass Spect. 8:77;Chu et al., 1995, J. Am. Chem. Soc. 117:5419; Brummel et al., 1994,Science 264:399; and Stevanovic et al., 1993, Bioorg. Med. Chem. Lett.3:431).

Combinatorial compound libraries useful for the methods of the presentinvention can be synthesized on supports. In one embodiment, a splitsynthesis method, a protocol of separating and mixing supports duringthe synthesis, is used to synthesize a library of compounds on supports(see e.g., Lam et al., 1997, Chem. Rev. 97:41-448; Ohlmeyer et al.,1993, Proc. Natl. Acad. Sci. USA 90:10922-10926 and references citedtherein). Each support in the final library has substantially one typeof compound attached to its surface. Other methods for synthesizingcombinatorial libraries on supports, wherein one product is attached toeach support, will be known to those of skill in the art (see, e.g.,Nefzi et al., 1997, Chem. Rev. 97:449-472).

In some embodiments of the present invention, compounds can be attachedto supports via linkers. Linkers can be integral and part of thesupport, or they may be nonintegral that are either synthesized on thesupport or attached thereto after synthesis. Linkers are useful not onlyfor providing points of compound attachment to the support, but also forallowing different groups of molecules to be cleaved from the supportunder different conditions, depending on the nature of the linker. Forexample, linkers can be, inter alia, electrophilically cleaved,nucleophilically cleaved, photocleavable, enzymatically cleaved, cleavedby metals, cleaved under reductive conditions or cleaved under oxidativeconditions. In a preferred embodiment, the compounds are cleaved fromthe support prior to high throughput screening of the compounds.

If the library comprises arrays or microarrays of compounds, whereineach compound has an address or identifier, the compound can bedeconvoluted, e.g., by cross-referencing the positive sample to originalcompound list that was applied to the individual test assays.

If the library is a peptide or nucleic acid library, the sequence of thecompound can be determined by direct sequencing of the peptide ornucleic acid. Such methods are well known to one of skill in the art.

A number of physico-chemical techniques can be used for the de novocharacterization of compounds. Examples of such techniques include, butare not limited to, mass spectrometry, NMR spectroscopy, X-raycrytallography and vibrational spectroscopy.

(Q) Use of Identified Compounds to Prevent, Treat, or Ameliorate One orMore Colorectal Pathologies

The present invention provides methods of preventing, treating, orameliorating one or more colorectal pathologies, said methods comprisingadministering to a subject in need thereof one or more compoundsidentified in accordance with the methods of the invention. In apreferred embodiment, the individual is human.

In one embodiment, the invention provides a method of preventing,treating, or ameliorating one or more colorectal pathologies, saidmethod comprising administering to an individual in need thereof a doseof a prophylactically or therapeutically effective amount of one or morecompounds identified in accordance with the methods of the invention. Ina specific embodiment, a compound identified in accordance with themethods of the invention is not administered to prevent, treat, orameliorate one or more colorectal pathologies, if such compound has beenused previously to prevent, treat, or ameliorate one or more colorectalpathologies. In another embodiment, a compound identified in accordancewith the methods of the invention is not administered to prevent, treat,or ameliorate one or more colorectal pathologies, if such compound hassuggested to be used to prevent, treat, or ameliorate one or morecolorectal pathologies. In another embodiment, a compound identified inaccordance with the methods of the invention specifically binds toand/or alters the expression and/or activity level of a protein or RNAproducts of only one biomarker of the invention. In another embodiment,a compound identified in accordance with the methods of the invention isnot administered to prevent, treat, or ameliorate one or more colorectalpathologies, if such compound binds to and/or alters the expressionand/or activity of a protein or RNA products of 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or more or all or any combinationof the biomarkers of Tables 2 or 6. In yet another embodiment, acompound identified in accordance with the methods of the inventionbinds to and/or alters the expression and/or activity level of a proteinor RNA products of at least 2, at least 3, at least 4, at least 5, atleast 10, at least 15, or more biomarkers of the invention.

The invention also provides methods of preventing, treating, orameliorating one or more colorectal pathologies, said methods comprisingadministering to a subject in need thereof one or more of the compoundsidentified utilizing the screening methods described herein, and one ormore other therapies (e.g., prophylactic or therapeutic agents andsurgery). In a specific embodiment, such therapies are currently beingused, have been used or are known to be useful in the prevention,treatment, or amelioration of one or more colorectal pathologies(including, but not limited to the prophylactic or therapeutic agentslisted herein). The therapies (e.g., prophylactic or therapeutic agents)of the combination therapies of the invention can be administeredsequentially or concurrently. In a specific embodiment, the combinationtherapies of the invention comprise a compound identified in accordancewith the invention and at least one other therapy that has the samemechanism of action as said compound. In another specific embodiment,the combination therapies of the invention comprise a compoundidentified in accordance with the methods of the invention and at leastone other therapy (e.g., prophylactic or therapeutic agent) which has adifferent mechanism of action than said compound. The combinationtherapies of the present invention improve the prophylactic ortherapeutic effect of a compound of the invention by functioningtogether with the compound to have an additive or synergistic effect.The combination therapies of the present invention reduce the sideeffects associated with the therapies (e.g., prophylactic or therapeuticagents).

The prophylactic or therapeutic agents of the combination therapies canbe administered to a subject in the same pharmaceutical composition.Alternatively, the prophylactic or therapeutic agents of the combinationtherapies can be administered concurrently to a subject in separatepharmaceutical compositions. The prophylactic or therapeutic agents maybe administered to a subject by the same or different routes ofadministration.

In specific embodiment, a pharmaceutical composition comprising one ormore compounds identified in an assay described herein is administeredto an individual, in some embodiments a human, to prevent, treat, orameliorate one or more colorectal pathologies. In accordance with theinvention, the pharmaceutical composition may also comprise one or moreprophylactic or therapeutic agents. In some embodiments, such agents arecurrently being used, have been used or are known to be useful in theprevention, treatment, or amelioration of one or more colorectalpathologies.

(R) Compounds of the Invention

Representative, non-limiting examples of compounds that can be used inaccordance with the methods of the invention to prevent, treat, and/orameliorate one or more colorectal pathologies are described in detailbelow.

First, such compounds can include, for example, antisense, ribozyme, ortriple helix compounds that can downregulate the expression or activityof a protein or RNA products of a biomarker of the invention. Suchcompounds are described in detail in the subsection below.

Second, such compounds can include, for example, antibody compositionsthat can modulate the expression of a protein or RNA products of abiomarker of the invention, or the activity of a protein products of abiomarker of the invention. In a specific embodiment, the antibodycompositions downregulate the expression a protein or RNA products of abiomarker of the invention, or the activity of a protein products of abiomarker of the invention. Such compounds are described in detail inthe subsection below.

Third, such compounds can include, for example, protein products of abiomarker of the invention. The invention encompasses the use ofpeptides or peptide mimetics selected to mimic a protein products of abiomarker of the invention to prevent, treat, or ameliorate one or morecolorectal pathologies. Further, such compounds can include, forexample, dominant-negative polypeptides that can modulate the expressiona protein or RNA products of a biomarker of the invention, or theactivity protein products of a biomarker of the invention.

The methods also encompass the use derivatives, analogs and fragments ofprotein products of a biomarker of the invention to prevent, treat, orameliorate one or more colorectal pathologies. In particular, theinvention encompasses the use of fragments of protein products of abiomarker of the invention comprising one or more domains of such aprotein(s) to prevent, treat, or ameliorate one or more colorectalpathologies. In another specific embodiment, the invention encompassesthe use of a protein products of a biomarker of the invention, or ananalog, derivative or fragment of such a protein which is expressed as afusion, or chimeric protein products (comprising the protein, fragment,analog, or derivative joined via a peptide bond to a heterologousprotein sequence).

In specific embodiments, an antisense oligonucleotide of at least 1, atleast 2, at least 3, at least 4, at least 5, at least 6, at least 7, atleast 8, at least 9, at least 10, at least 15, or more of biomarkers ofthe invention are administered to prevent, treat, or ameliorate one ormore colorectal pathologies. In other embodiments, one or more ofprotein products of a biomarker of the invention or a fragment, analog,or derivative thereof are administered to prevent, treat, or ameliorateone or more colorectal pathologies. In other embodiment, one or moreantibodies that specifically bind to protein products of the inventionare administered to prevent, treat, or ameliorate one or more colorectalpathologies. In other embodiments, one or more dominant-negativepolypeptides are administered to prevent, treat, or ameliorate one ormore colorectal pathologies.

Antisense, Ribozyme, Triple-Helix Compositions

Standard techniques can be utilized to produce antisense, triple helix,or ribozyme molecules reactive to one or more of the genes listed inTables 2 or 6, and transcripts of the genes listed in Tables 2 or 6, foruse as part of the methods described herein. First, standard techniquescan be utilized for the production of antisense nucleic acid molecules,i.e., molecules which are complementary to a sense nucleic acid encodinga polypeptide of interest, e.g., complementary to the coding strand of adouble-stranded cDNA molecule or complementary to an mRNA sequence.Accordingly, an antisense nucleic acid can hydrogen bond to a sensenucleic acid. The antisense nucleic acid can be complementary to anentire coding strand, or to only a portion thereof, e.g., all or part ofthe protein coding region (or open reading frame). An antisense nucleicacid molecule can be antisense to all or part of a non-coding region ofthe coding strand of a nucleotide sequence encoding a polypeptide ofinterest. The non-coding regions (“5′ and 3′ untranslated regions”) arethe 5′ and 3′ sequences that flank the coding region and are nottranslated into amino acids.

An antisense oligonucleotide can be, for example, about 5, 10, 15, 20,25, 30, 35, 40, 45 or 50 nucleotides or more in length. An antisensenucleic acid of the invention can be constructed using chemicalsynthesis and enzymatic ligation reactions using procedures known in theart. For example, an antisense nucleic acid (e.g., an antisenseoligonucleotide) can be chemically synthesized using naturally occurringnucleotides or variously modified nucleotides designed to increase thebiological stability of the molecules or to increase the physicalstability of the duplex formed between the antisense and sense nucleicacids, e.g., phosphorothioate derivatives and acridine substitutednucleotides can be used. Examples of modified nucleotides which can beused to generate the antisense nucleic acid include 5-fluorouracil,5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine,4-acetylcytosine, 5-(carboxy-hydroxylmethyl) uracil,5-carboxymethylaminomethyl-2-thiouridine,5-carboxymethylamino-methyluracil, dihydrouracil,beta-D-galactosylqueosine, inosine, N6-isopentenyladenine,1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine,2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine,7-methylguanine, 5-methylaminomethyluracil,5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine,5′-methoxycarboxymethyluracil, 5-methoxyuracil,2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v),wybutoxosine, pseudouracil, queosine, 2-thiocytosine,5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v),5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w,and 2,6-diaminopurine. Alternatively, the antisense nucleic acid can beproduced biologically using an expression vector into which a nucleicacid has been subcloned in an antisense orientation (i.e., RNAtranscribed from the inserted nucleic acid will be of an antisenseorientation to a target nucleic acid of interest).

Antisense nucleic acid molecules administered to a subject or generatedin situ such that they hybridize with or bind to cellular mRNA encodingthe polypeptide of interest to thereby inhibit expression, e.g., byinhibiting transcription and/or translation. The hybridization can be byconventional nucleotide complementarity to form a stable duplex, or, forexample, in the case of an antisense nucleic acid molecule which bindsto DNA duplexes, through specific interactions in the major groove ofthe double helix. An example of a route of administration of antisensenucleic acid molecules of the invention includes direct injection at atissue, e.g., a joint (e.g., a knee, hip, elbow, and knuckle), site.Alternatively, antisense nucleic acid molecules can be modified totarget selected cells and then administered systemically. For example,for systemic administration, antisense molecules can be modified suchthat they specifically bind to receptors or antigens expressed on aselected cell, e.g., a T cell or intestinal cell, surface, e.g., bylinking the antisense nucleic acid molecules to peptides or antibodieswhich bind to cell surface receptors or antigens. The antisense nucleicacid molecules can also be delivered to cells using vectors, e.g., genetherapy vectors, described below. To achieve sufficient intracellularconcentrations of the antisense molecules, vector constructs in whichthe antisense nucleic acid molecule is placed under the control of astrong pol II or pol III promoter are preferred.

An antisense nucleic acid molecule of interest can be an α-anomericnucleic acid molecule. An α-anomeric nucleic acid molecule formsspecific double-stranded hybrids with complementary RNA in which,contrary to the usual α-units, the strands run parallel to each other(Gaultier et al., 1987, Nucleic Acids Res. 15:6625-6641). The antisensenucleic acid molecule can also comprise a 2′-o-methylribonucleotide(Inoue et al., 1987, Nucleic Acids Res. 15:6131-6148) or a chimericRNA-DNA analogue (Inoue et al., 1987, FEBS Lett. 215:327-330).

Ribozymes are catalytic RNA molecules with ribonuclease activity thatare capable of cleaving a single-stranded nucleic acid, such as an mRNA,to which they have a complementary region, and can also be generatedusing standard techniques. Thus, ribozymes (e.g., hammerhead ribozymes(described in Haselhoff and Gerlach, 1988, Nature 334:585-591)) can beused to catalytically cleave mRNA transcripts to thereby inhibittranslation of the protein encoded by the mRNA. A ribozyme havingspecificity for a nucleic acid molecule encoding a polypeptide ofinterest can be designed based upon the nucleotide sequence of a cDNAdisclosed herein. For example, a derivative of a Tetrahymena L-19 IVSRNA can be constructed in which the nucleotide sequence of the activesite is complementary to the nucleotide sequence to be cleaved in a Cechet al. U.S. Pat. No. 4,987,071; and Cech et al. U.S. Pat. No. 5,116,742.Alternatively, an mRNA encoding a polypeptide of interest can be used toselect a catalytic RNA having a specific ribonuclease activity from apool of RNA molecules. See, e.g., Bartel and Szostak, 1993, Science261:1411-1418.

Triple helical structures can also be generated using well knowntechniques. For example, expression of a polypeptide of interest can beinhibited by targeting nucleotide sequences complementary to theregulatory region of the gene encoding the polypeptide (e.g., thepromoter and/or enhancer) to form triple helical structures that preventtranscription of the gene in target cells. See generally Helene, 1991,Anticancer Drug Des. 6(6):569-84; Helene, 1992, Ann. N.Y. Acad. Sci.660:27-36; and Maher, 1992, Bioassays 14(12):807-15.

In various embodiments, nucleic acid compositions can be modified at thebase moiety, sugar moiety or phosphate backbone to improve, e.g., thestability, hybridization, or solubility of the molecule. For example,the deoxyribose phosphate backbone of the nucleic acids can be modifiedto generate peptide nucleic acids (see Hyrup et al., 1996, Bioorganic &Medicinal Chemistry 4(1): 5-23). As used herein, the terms “peptidenucleic acids” or “PNAs” refer to nucleic acid mimics, e.g., DNA mimics,in which the deoxyribose phosphate backbone is replaced by apseudopeptide backbone and only the four natural nucleobases areretained. The neutral backbone of PNAs has been shown to allow forspecific hybridization to DNA and RNA under conditions of low ionicstrength. The synthesis of PNA oligomers can be performed using standardsolid phase peptide synthesis protocols as described in Hyrup et al.,1996, supra; Perry-O'Keefe et al., 1996, Proc. Natl. Acad. Sci. USA 93:14670-675.

PNAs can, for example, be modified, e.g., to enhance their stability orcellular uptake, by attaching lipophilic or other helper groups to PNA,by the formation of PNA-DNA chimeras, or by the use of liposomes orother techniques of drug delivery known in the art. For example, PNA-DNAchimeras can be generated which may combine the advantageous propertiesof PNA and DNA. Such chimeras allow DNA recognition enzymes, e.g., RNAseH and DNA polymerases, to interact with the DNA portion while the PNAportion would provide high binding affinity and specificity. PNA-DNAchimeras can be linked using linkers of appropriate lengths selected interms of base stacking, number of bonds between the nucleobases, andorientation (Hyrup, 1996, supra). The synthesis of PNA-DNA chimeras canbe performed as described in Hyrup, 1996, supra, and Finn et al., 1996,Nucleic Acids Res. 24(17):3357-63. For example, a DNA chain can besynthesized on a support using standard phosphoramidite couplingchemistry and modified nucleoside analogs. Compounds such as5′-(4-methoxytrityl)amino-5′-deoxy-thymidine phosphoramidite can be usedas a link between the PNA and the 5′ end of DNA (Mag et al., 1989,Nucleic Acids Res. 17:5973-88). PNA monomers are then coupled in astepwise manner to produce a chimeric molecule with a 5′ PNA segment anda 3′ DNA segment (Finn et al., 1996, Nucleic Acids Res. 24(17):3357-63).Alternatively, chimeric molecules can be synthesized with a 5′ DNAsegment and a 3′ PNA segment (Peterser et al., 1975, Bioorganic Med.Chem. Lett. 5:1119-11124).

In other embodiments, the oligonucleotide may include other appendedgroups such as peptides (e.g., for targeting host cell receptors invivo), or agents facilitating transport across the cell membrane (see,e.g., Letsinger et al., 1989, Proc. Natl. Acad. Sci. USA 86:6553-6556;Lemaitre et al., 1987, Proc. Natl. Acad. Sci. USA 84:648-652;International Publication No. WO 88/09810) or the blood-brain barrier(see, e.g., International Publication No. WO 89/10134). In addition,oligonucleotides can be modified with hybridization-triggered cleavageagents (see, e.g., Krol et al., 1988, Bio/Techniques 6:958-976) orintercalating agents (see, e.g., Zon, 1988, Pharm. Res. 5:539-549). Tothis end, the oligonucleotide may be conjugated to another molecule,e.g., a peptide, hybridization triggered cross-linking agent, transportagent, hybridization-triggered cleavage agent, etc.

Antibody Compositions

In one embodiment, antibodies that specifically bind to one or moreprotein products of one or more biomarkers of the invention areadministered to an individual, in some embodiments a human, to prevent,treat, or ameliorate one or more colorectal pathologies. In anotherembodiment, any combination of antibodies that specifically bind to oneor more protein products of one or more biomarkers of the invention areadministered to a subject, in some embodiments a human, to prevent,treat, or ameliorate one or more colorectal pathologies. In a specificembodiment, one or more antibodies that specifically bind to one or moreprotein products of one or more biomarkers of the invention areadministered to a subject, in some embodiments a human, in combinationwith other types of therapies to prevent, treat, or ameliorate one ormore colorectal pathologies.

One or more antibodies that specifically bind to one or more proteinproducts of one or more biomarkers of the invention can be administeredto a subject, in some embodiments a human, using various deliverysystems are known to those of skill in the art. For example, suchantibodies can be administered by encapsulation in liposomes,microparticles or microcapsules. See, e.g., U.S. Pat. No. 5,762,904,U.S. Pat. No. 6,004,534, and International Publication No. WO 99/52563.In addition, such antibodies can be administered using recombinant cellscapable of expressing the antibodies, or retroviral, other viral vectorsor non-viral vectors capable of expressing the antibodies.

Antibodies that specifically bind one or more protein products of one ormore biomarkers of the invention can be obtained from any known source.For example, Table 5 provides a list of commercially availableantibodies specific for one or more of the protein products of thebiomarkers of the invention. Alternatively, antibodies that specificallybind to one or more protein products of one or more biomarkers of theinvention can be produced by any method known in the art for thesynthesis of antibodies, in particular, by chemical synthesis or in someembodiments, by recombinant expression techniques.

Antibodies include, but are not limited to, polyclonal antibodies,monoclonal antibodies, bispecific antibodies, multispecific antibodies,human antibodies, humanized antibodies, camelised antibodies, chimericantibodies, single-chain Fvs (scFv) (see e.g., Bird et al. (1988)Science 242:423-426; and Huston et al. (1988) Proc. Nati. Acad. Sci. USA85:5879-5883), single chain antibodies, single domain antibodies, Fabfragments, F(ab′) fragments, disulfide-linked Fvs (sdFv), andanti-idiotypic (anti-Id) antibodies (including, e.g., anti-Id antibodiesto antibodies of the invention), and epitope-binding fragments of any ofthe above. The term “antibody”, as used herein, refers to immunoglobulinmolecules and immunologically active fragments of immunoglobulinmolecules, i.e., molecules that contain an antigen binding site.Immunoglobulin molecules can be of any type (e.g., IgG, IgE, IgM, IgD,IgA and IgY), class (e.g., IgG₁, IgG₂, IgG₃, IgG₄, IgA₁ and IgA₂) orsubclass. Examples of immunologically active fragments of immunoglobulinmolecules include F(ab) fragments (a monovalent fragment consisting ofthe VL, VH, CL and CH1 domains) and F(ab′)2 fragments (a bivalentfragment comprising two Fab fragments linked by a disulfide bridge atthe hinge region) which can be generated by treating the antibody withan enzyme such as pepsin or papain. Immunologically active fragmentsalso include, but are not limited to, Fd fragments (consisting of the VHand CH1 domains), Fv fragments (consisting of the VL and VH domains of asingle arm of an antibody), dAb fragments (consisting of a VH domain;Ward et al., (1989) Nature 341:544-546), and isolated complementaritydetermining regions (CDRs). Antibodies that specifically bind to anantigen can be produced by any method known in the art for the synthesisof antibodies, in particular, by chemical synthesis or in someembodiments, by recombinant expression techniques.

Polyclonal antibodies that specifically bind to an antigen can beproduced by various procedures well-known in the art. For example, ahuman antigen can be administered to various host animals including, butnot limited to, rabbits, mice, rats, etc. to induce the production ofsera containing polyclonal antibodies specific for the human antigen.Various adjuvants may be used to increase the immunological response,depending on the host species, and include but are not limited to,Freund's (complete and incomplete), mineral gels such as aluminumhydroxide, surface active substances such as lysolecithin, pluronicpolyols, polyanions, peptides, oil emulsions, keyhole limpethemocyanins, dinitrophenol, and potentially useful human adjuvants suchas BCG (bacille Calmette-Guerin) and corynebacterium parvum. Suchadjuvants are also well known in the art.

The term “monospecific antibody” refers to an antibody that displays asingle binding specificity and affinity for a particular target, e.g.,epitope. This term includes monoclonal antibodies. Monoclonal antibodiescan be prepared using a wide variety of techniques known in the artincluding the use of hybridoma, recombinant, and phage displaytechnologies, or a combination thereof. See, e.g., U.S. Pat. No. RE32,011, U.S. Pat. Nos. 4,902,614, 4,543,439, 4,411,993 and 4,196,265;Kennett et al (eds.), Monoclonal Antibodies, Hybridomas: A New Dimensionin Biological Analyses, Plenum Press (1980); and Harlow and Lane (eds.),Antibodies. A Laboratory Manual, Cold Spring Harbor Laboratory Press(1988), which are incorporated herein by reference. For example,monoclonal antibodies can be produced using hybridoma techniquesincluding those known in the art and taught, for example, in Harlow etal., Antibodies: A Laboratory Manual, (Cold Spring Harbor LaboratoryPress, 2nd ed. 1988); Hammerling, et al., in: Monoclonal Antibodies andT-Cell Hybridomas 563-681 (Elsevier, N.Y., 1981) (said referencesincorporated by reference in their entireties). Other techniques thatenable the production of antibodies through recombinant techniques(e.g., techniques described by William D. Huse et al., 1989, Science,246: 1275-1281; L. Sastry et al., 1989, Proc. Natl. Acad. Sci. USA, 86:5728-5732; and Michelle Alting-Mees et al., Strategies in MolecularBiology, 3: 1-9 (1990) involving a commercial system available fromStratacyte, La Jolla, Calif.) may also be utilized to constructmonoclonal antibodies. The term “monoclonal antibody” as used herein isnot limited to antibodies produced through hybridoma technology. Theterm “monoclonal antibody” refers to an antibody that is derived from asingle clone, including any eukaryotic, prokaryotic, or phage clone, andnot the method by which it is produced.

Methods for producing and screening for specific antibodies usinghybridoma technology are routine and well known in the art. Briefly,mice can be immunized with a protein products of a biomarker of theinvention, and once an immune response is detected, e.g., antibodiesspecific for the protein are detected in the mouse serum, the mousespleen is harvested and splenocytes isolated. The splenocytes are thenfused by well known techniques to any suitable myeloma cells, forexample cells from cell line SP20 available from the ATCC. Hybridomasare selected and cloned by limited dilution. Additionally, a RIMMS(repetitive immunization multiple sites) technique can be used toimmunize an animal (Kilptrack et al., 1997, Hybridoma 16:381-9,incorporated by reference in its entirety). The hybridoma clones arethen assayed by methods known in the art for cells that secreteantibodies capable of binding a polypeptide of the invention. Ascitesfluid, which generally contains high levels of antibodies, can begenerated by immunizing mice with positive hybridoma clones.

Accordingly, the present invention provides methods of generatingantibodies by culturing a hybridoma cell secreting an antibody of theinvention wherein, in some embodiments, the hybridoma is generated byfusing splenocytes isolated from a mouse immunized with a proteinproducts of a biomarker of the invention, with myeloma cells and thenscreening the hybridomas resulting from the fusion for hybridoma clonesthat secrete an antibody able to bind to the protein or proteinfragment.

Antibody fragments which recognize specific epitopes of a proteinproducts of a biomarker of the invention may be generated by anytechnique known to those of skill in the art. For example, Fab andF(ab′)2 fragments of the invention may be produced by proteolyticcleavage of immunoglobulin molecules, using enzymes such as papain (toproduce Fab fragments) or pepsin (to produce F(ab′)2 fragments). F(ab′)2fragments contain the variable region, the light chain constant regionand the CH1 domain of the heavy chain. Further, the antibodies of thepresent invention can also be generated using various phage displaymethods known in the art.

In phage display methods, functional antibody domains are displayed onthe surface of phage particles which carry the polynucleotide sequencesencoding them. In particular, DNA sequences encoding VH and VL domainsare amplified from animal cDNA libraries (e.g., human or murine cDNAlibraries of affected tissues). The DNA encoding the VH and VL domainsare recombined together with an scFv linker by PCR and cloned into aphagemid vector. The vector is electroporated in E. coli and the E. coliis infected with helper phage. Phage used in these methods are typicallyfilamentous phage including fd and M13 and the VH and VL domains areusually recombinantly fused to either the phage gene III or gene VIII.Phage expressing an antigen binding domain that binds to a particularantigen can be selected or identified with antigen, e.g., using labeledantigen or antigen bound or captured to a solid surface or bead.Examples of phage display methods that can be used to make theantibodies of the present invention include those disclosed in Brinkmanet al., 1995, J. Immunol. Methods 182:41-50; Ames et al., 1995, J.Immunol. Methods 184:177-186; Kettleborough et al., 1994, Eur. J.Immunol. 24:952-958; Persic et al., 1997, Gene 187:9-18; Burton et al.,1994, Advances in Immunology 57:191-280; PCT Application No. PCT/GB91/O1134; International Publication Nos. WO 90/02809, WO 91/10737, WO92/01047, WO 92/18619, WO 93/1 1236, WO 95/15982, WO 95/20401, andWO97/13844; and U.S. Pat. Nos. 5,698,426, 5,223,409, 5,403,484,5,580,717, 5,427,908, 5,750,753, 5,821,047, 5,571,698, 5,427,908,5,516,637, 5,780,225, 5,658,727, 5,733,743 and 5,969,108; each of whichis incorporated herein by reference in its entirety.

As described in the above references, after phage selection, theantibody coding regions from the phage can be isolated and used togenerate whole antibodies, including human antibodies, or any otherdesired antigen binding fragment, and expressed in any desired host,including mammalian cells, insect cells, plant cells, yeast, andbacteria, e.g., as described below. Techniques to recombinantly produceFab, Fab′ and F(ab′)2 fragments can also be employed using methods knownin the art such as those disclosed in International Publication No. WO92/22324; Mullinax et al., 1992, BioTechniques 12(6):864-869; Sawai etal., 1995, AJRI 34:26-34; and Better et al., 1988, Science 240:1041-1043(said references incorporated by reference in their entireties).

To generate whole antibodies, PCR primers including VH or VL nucleotidesequences, a restriction site, and a flanking sequence to protect therestriction site can be used to amplify the VH or VL sequences in scFvclones. Utilizing cloning techniques known to those of skill in the art,the PCR amplified VH domains can be cloned into vectors expressing a VHconstant region, e.g., the human gamma 4 constant region, and the PCRamplified VL domains can be cloned into vectors expressing a VL constantregion, e.g., human kappa or lamba constant regions. In someembodiments, the vectors for expressing the VH or VL domains comprise anEF-1a promoter, a secretion signal, a cloning site for the variabledomain, constant domains, and a selection marker such as neomycin. TheVH and VL domains may also cloned into one vector expressing thenecessary constant regions. The heavy chain conversion vectors and lightchain conversion vectors are then co-transfected into cell lines togenerate stable or transient cell lines that express full-lengthantibodies, e.g., IgG, using techniques known to those of skill in theart.

For some uses, including in vivo use of antibodies in humans and invitro detection assays, it may be preferable to use human or chimericantibodies. Completely human antibodies are particularly desirable fortherapeutic treatment of human subjects. Human antibodies can be made bya variety of methods known in the art including phage display methodsdescribed above using antibody libraries derived from humanimmunoglobulin sequences. See also U.S. Pat. Nos. 4,444,887 and4,716,111; and International Publication Nos. WO 98/46645, WO 98/50433,WO 98/24893, WO98/16654, WO 96/34096, WO 96/33735, and WO 91/10741; eachof which is incorporated herein by reference in its entirety.

Antibodies can also be produced by a transgenic animal. In particular,human antibodies can be produced using transgenic mice which areincapable of expressing functional endogenous immunoglobulins, but whichcan express human immunoglobulin genes. For example, the human heavy andlight chain immunoglobulin gene complexes may be introduced randomly orby homologous recombination into mouse embryonic stem cells.Alternatively, the human variable region, constant region, and diversityregion may be introduced into mouse embryonic stem cells in addition tothe human heavy and light chain genes. The mouse heavy and light chainimmunoglobulin genes may be rendered non-functional separately orsimultaneously with the introduction of human immunoglobulin loci byhomologous recombination. In particular, homozygous deletion of theJ_(H) region prevents endogenous antibody production. The modifiedembryonic stem cells are expanded and microinjected into blastocysts toproduce chimeric mice. The chimeric mice are then be bred to producehomozygous offspring which express human antibodies. The transgenic miceare immunized in the normal fashion with a selected antigen, e.g., allor a portion of a polypeptide of the invention. Monoclonal antibodiesdirected against the antigen can be obtained from the immunized,transgenic mice using conventional hybridoma technology. The humanimmunoglobulin transgenes harbored by the transgenic mice rearrangeduring B cell differentiation, and subsequently undergo class switchingand somatic mutation. Thus, using such a technique, it is possible toproduce therapeutically useful IgG, IgA, IgM and IgE antibodies. For anoverview of this technology for producing human antibodies, see Lonbergand Huszar (1995, Int. Rev. Immunol. 13:65-93). For a detaileddiscussion of this technology for producing human antibodies and humanmonoclonal antibodies and protocols for producing such antibodies, see,e.g., International Publication Nos. WO 98/24893, WO 96/34096, and WO96/33735; and U.S. Pat. Nos. 5,413,923, 5,625,126, 5,633,425, 5,569,825,5,661,016, 5,545,806, 5,814,318, and 5,939,598, which are incorporatedby reference herein in their entirety. In addition, companies such asAbgenix, Inc. (Freemont, Calif.) and Genpharm (San Jose, Calif.) can beengaged to provide human antibodies directed against a selected antigenusing technology similar to that described above.

U.S. Pat. No. 5,849,992, for example, describes a method of expressingan antibody in the mammary gland of a transgenic mammal. A transgene isconstructed that includes a milk-specific promoter and nucleic acidsencoding the antibody of interest and a signal sequence for secretion.The milk produced by females of such transgenic mammals includes,secreted-therein, the antibody of interest. The antibody can be purifiedfrom the milk, or for some applications, used directly.

A chimeric antibody is a molecule in which different portions of theantibody are derived from different immunoglobulin molecules. Methodsfor producing chimeric antibodies are known in the art. See e.g.,Morrison, 1985, Science 229:1202; Oi et al., 1986, BioTechniques 4:214;Gillies et al., 1989, J. Immunol. Methods 125:191-202; and U.S. Pat.Nos. 5,807,715, 4,816,567, 4,816,397, and 6,331,415, which areincorporated herein by reference in their entirety.

A humanized antibody is an antibody or its variant or fragment thereofwhich is capable of binding to a predetermined antigen and whichcomprises a framework region having substantially the amino acidsequence of a human immunoglobulin and a CDR having substantially theamino acid sequence of a non-human immunoglobulin. A humanized antibodycomprises substantially all of at least one, and typically two, variabledomains (Fab, Fab′, F(ab′).sub.2, Fabc, Fv) in which all orsubstantially all of the CDR regions correspond to those of a non-humanimmunoglobulin (i.e., donor antibody) and all or substantially all ofthe framework regions are those of a human immunoglobulin consensussequence. In some embodiments, a humanized antibody also comprises atleast a portion of an immunoglobulin constant region (Fc), typicallythat of a human immunoglobulin. Ordinarily, the antibody will containboth the light chain as well as at least the variable domain of a heavychain. The antibody also may include the CH1, hinge, CH2, CH3, and CH4regions of the heavy chain. The humanized antibody can be selected fromany class of immunoglobulins, including IgM, IgG, IgD, IgA and IgE, andany isotype, including IgG₁, IgG₂, IgG₃ and IgG₄. Usually the constantdomain is a complement fixing constant domain where it is desired thatthe humanized antibody exhibit cytotoxic activity, and the class istypically IgG₁. Where such cytotoxic activity is not desirable, theconstant domain may be of the IgG₂ class. The humanized antibody maycomprise sequences from more than one class or isotype, and selectingparticular constant domains to optimize desired effector functions iswithin the ordinary skill in the art. The framework and CDR regions of ahumanized antibody need not correspond precisely to the parentalsequences, e.g., the donor CDR or the consensus framework may bemutagenized by substitution, insertion or deletion of at least oneresidue so that the CDR or framework residue at that site does notcorrespond to either the consensus or the import antibody. Suchmutations, however, will not be extensive. Usually, at least 75% of thehumanized antibody residues will correspond to those of the parental FRand CDR sequences, more often 90%, and most often greater than 95%.Humanized antibody can be produced using variety of techniques known inthe art, including but not limited to, CDR-grafting (European Patent No.EP 239,400; International Publication No. WO 91/09967; and U.S. Pat.Nos. 5,225,539, 5,530,101, and 5,585,089), veneering or resurfacing(European Patent Nos. EP 592,106 and EP 519,596; Padlan, 1991, MolecularImmunology 28(4/5):489-498; Studnicka et al., 1994, Protein Engineering7(6):805-814; and Roguska et al., 1994, PNAS 91:969-973), chainshuffling (U.S. Pat. No. 5,565,332), and techniques disclosed in, e.g.,U.S. Pat. No. 6,407,213, U.S. Pat. No. 5,766,886, WO 9317105, Tan etal., 2002, J. Immunol. 169:1119-25, Caldas et al., 2000, Protein Eng.13(5):353-60, Morea et al., 2000, Methods 20(3):267-79, Baca et al.,1997, J. Biol. Chem. 272(16):10678-84, Roguska et al., 1996, ProteinEng. 9(10):895-904, Couto et al., 1995, Cancer Res. 55 (23Supp):5973s-5977s, Couto et al., 1995, Cancer Res. 55(8):1717-22, SandhuJ S, 1994, Gene 150(2):409-10, and Pedersen et al., 1994, J. Mol. Biol.235(3):959-73. Often, framework residues in the framework regions willbe substituted with the corresponding residue from the CDR donorantibody to alter, preferably improve, antigen binding. These frameworksubstitutions are identified by methods well known in the art, e.g., bymodeling of the interactions of the CDR and framework residues toidentify framework residues important for antigen binding and sequencecomparison to identify unusual framework residues at particularpositions. (See, e.g., Queen et al., U.S. Pat. No. 5,585,089; andRiechmann et al., 1988, Nature 332:323, which are incorporated herein byreference in their entireties.)

Single domain antibodies, for example, antibodies lacking the lightchains, can be produced by methods well-known in the art. See Riechmannet al., 1999, J. Immuno. 231:25-38; Nuttall et al., 2000, Curr. Pharm.Biotechnol. 1(3):253-263; Muylderman, 2001, J. Biotechnol. 74(4):277302;U.S. Pat. No. 6,005,079; and International Publication Nos. WO 94/04678,WO 94/25591, and WO 01/44301, each of which is incorporated herein byreference in its entirety.

Further, the antibodies that specifically bind to an antigen can, inturn, be utilized to generate anti-idiotype antibodies that “mimic” anantigen using techniques well known to those skilled in the art. (See,e.g., Greenspan & Bona, 1989, FASEB J. 7(5):437-444; and Nissinoff,1991, J. Immunol. 147(8):2429-2438). Such antibodies can be used, aloneor in combination with other therapies, in the prevention, treatment, oramelioration of one or more colorectal pathologies.

The invention encompasses polynucleotides comprising a nucleotidesequence encoding an antibody or fragment thereof that specificallybinds to an antigen. The invention also encompasses polynucleotides thathybridize under high stringency, intermediate or lower stringencyhybridization conditions to polynucleotides that encode an antibody ofthe invention.

The polynucleotides may be obtained, and the nucleotide sequence of thepolynucleotides determined, by any method known in the art. Thenucleotide sequences encoding known antibodies can be determined usingmethods well known in the art, i.e., nucleotide codons known to encodeparticular amino acids are assembled in such a way to generate a nucleicacid that encodes the antibody. Such a polynucleotide encoding theantibody may be assembled from chemically synthesized oligonucleotides(e.g., as described in Kutmeier et al., 1994, BioTechniques 17:242),which, briefly, involves the synthesis of overlapping oligonucleotidescontaining portions of the sequence encoding the antibody, fragments, orvariants thereof, annealing and ligating of those oligonucleotides, andthen amplification of the ligated oligonucleotides by PCR.

Alternatively, a polynucleotide encoding an antibody may be generatedfrom nucleic acid from a suitable source. If a clone containing anucleic acid encoding a particular antibody is not available, but thesequence of the antibody molecule is known, a nucleic acid encoding theimmunoglobulin may be chemically synthesized or obtained from a suitablesource (e.g., an antibody cDNA library or a cDNA library generated from,or nucleic acid, in some embodiments poly A+ RNA, isolated from, anytissue or cells expressing the antibody, such as hybridoma cellsselected to express an antibody of the invention) by PCR amplificationusing synthetic primers hybridizable to the 3′ and 5′ ends of thesequence or by cloning using an oligonucleotide probe specific for theparticular gene sequence to identify, e.g., a cDNA clone from a cDNAlibrary that encodes the antibody. Amplified nucleic acids generated byPCR may then be cloned into replicable cloning vectors using any methodwell known in the art.

Once the nucleotide sequence of the antibody is determined, thenucleotide sequence of the antibody may be manipulated using methodswell known in the art for the manipulation of nucleotide sequences,e.g., recombinant DNA techniques, site directed mutagenesis, PCR, etc.(see, for example, the techniques described in Sambrook et al., 1990,Molecular Cloning, A Laboratory Manual, 2d Ed., Cold Spring HarborLaboratory, Cold Spring Harbor, N.Y. and Ausubel et al., eds., 1998,Current Protocols in Molecular Biology, John Wiley & Sons, NY, which areboth incorporated by reference herein in their entireties), to generateantibodies having a different amino acid sequence, for example to createamino acid substitutions, deletions, and/or insertions.

Once a polynucleotide encoding an antibody molecule, heavy or lightchain of an antibody, or fragment thereof (in some embodiments, but notnecessarily, containing the heavy or light chain variable domain) of theinvention has been obtained, the vector for the production of theantibody molecule may be produced by recombinant DNA technology usingtechniques well-known in the art.

In one preferred embodiment, monoclonal antibodies are produced inmammalian cells. Preferred mammalian host cells for expressing the cloneantibodies or antigen-binding fragments thereof include Chinese HamsterOvary (CHO cells) (including dhfr-CHO cells, described in Urlaub andChasin (1980, Proc. Natl. Acad. Sci. USA 77:4216-4220), used with a DHFRselectable marker, e.g., as described in Kaufman and Sharp (1982, Mol.Biol. 159:601-621), lymphocytic cell lines, e.g., NS0 myeloma cells andSP2 cells, COS cells, and a cell from a transgenic animal, e.g., atransgenic mammal. For example, the cell is a mammary epithelial cell.

In addition to the nucleic acid sequence encoding the diversifiedimmunoglobulin domain, the recombinant expression vectors may carryadditional sequences, such as sequences that regulate replication of thevector in host cells (e.g., origins of replication) and selectablemarker genes. The selectable marker gene facilitates selection of hostcells into which the vector has been introduced (see e.g., U.S. Pat.Nos. 4,399,216, 4,634,665 and 5,179,017). For example, typically theselectable marker gene confers resistance to drugs, such as G418,hygromycin or methotrexate, on a host cell into which the vector hasbeen introduced. Preferred selectable marker genes include thedihydrofolate reductase (DHFR) gene (for use in dhfr⁻ host cells withmethotrexate selection/amplification) and the neo gene (for G418selection).

In an exemplary system for recombinant expression of an antibody, orantigen-binding portion thereof, of the invention, a recombinantexpression vector encoding both the antibody heavy chain and theantibody light chain is introduced into dhfr⁻ CHO cells by calciumphosphate-mediated transfection. Within the recombinant expressionvector, the antibody heavy and light chain genes are each operativelylinked to enhancer/promoter regulatory elements (e.g., derived fromSV40, CMV, adenovirus and the like, such as a CMV enhancer/AdMLPpromoter regulatory element or an SV40 enhancer/AdMLP promoterregulatory element) to drive high levels of transcription of the genes.The recombinant expression vector also carries a DHFR gene, which allowsfor selection of CHO cells that have been transfected with the vectorusing methotrexate selection/amplification. The selected transformanthost cells are cultured to allow for expression of the antibody heavyand light chains and intact antibody is recovered from the culturemedium. Standard molecular biology techniques are used to prepare therecombinant expression vector, transfect the host cells, select fortransformants, culture the host cells and recover the antibody from theculture medium. For example, some antibodies can be isolated by affinitychromatography with a Protein A or Protein G.

For antibodies that include an Fc domain, the antibody production systempreferably synthesizes antibodies in which the Fc region isglycosylated. For example, the Fc domain of IgG molecules isglycosylated at asparagine 297 in the CH2 domain. This asparagine is thesite for modification with biantennary-type oligosaccharides. It hasbeen demonstrated that this glycosylation is required for effectorfunctions mediated by Fcg receptors and complement C1q (Burton and Woof,1992, Adv. Immunol. 51:1-84; Jefferis et al., 1998, Immunol. Rev.163:59-76). In a preferred embodiment, the Fc domain is produced in amammalian expression system that appropriately glycosylates the residuecorresponding to asparagine 297. The Fc domain can also include othereukaryotic post-translational modifications.

Once an antibody molecule has been produced by recombinant expression,it may be purified by any method known in the art for purification of animmunoglobulin molecule, for example, by chromatography (e.g., ionexchange, affinity, particularly by affinity for the specific antigenafter Protein A, and sizing column chromatography), centrifugation,differential solubility, or by any other standard technique for thepurification of proteins. Further, the antibodies or fragments thereofmay be fused to heterologous polypeptide sequences known in the art tofacilitate purification.

Gene Therapy Techniques

Gene therapy refers to therapy performed by the administration to asubject of an expressed or expressible nucleic acid. Any of the methodsfor gene therapy available in the art can be used according to thepresent invention. Exemplary methods are described below.

In specific embodiments, one or more antisense oligonucleotides for oneor more biomarkers of the invention are administered to prevent, treat,or ameliorate one or more colorectal pathologies, by way of genetherapy. In other embodiments, one or more nucleic acid moleculescomprising nucleotides encoding one or more antibodies that specificallybind to one or more protein products of one or more biomarkers of theinvention are administered to prevent, treat, or ameliorate one or morecolorectal pathologies, by way of gene therapy. In other embodiments,one or more nucleic acid molecules comprising nucleotides encodingprotein products of one or more biomarkers of the invention or analogs,derivatives or fragments thereof, are administered to prevent, treat, orameliorate one or more colorectal pathologies, by way of gene therapy.In yet other embodiments, one or more nucleic acid molecules comprisingnucleotides encoding one or more dominant-negative polypeptides of oneor more protein products of one or more biomarker of the invention areadministered to prevent, treat, or ameliorate one or more colorectalpathologies, by way of gene therapy.

For general reviews of the methods of gene therapy, see Goldspiel etal., 1993, Clinical Pharmacy 12:488-505; Wu and Wu, 1991, Biotherapy3:87-95; Tolstoshev, 1993, Ann. Rev. Pharmacol. Toxicol. 32:573-596;Mulligan, 1993, Science 260:926-932; and Morgan and Anderson, 1993, Ann.Rev. Biochem. 62:191-217; May, 1993, TIBTECH 11(5):155-215). Methodscommonly known in the art of recombinant DNA technology which can beused are described in Ausubel et al. (eds.), 1993, Current Protocols inMolecular Biology, John Wiley & Sons, NY; and Kriegler, 1990, GeneTransfer and Expression, A Laboratory Manual, Stockton Press, NY.

In one aspect, a composition of the invention comprises nucleic acidsequences encoding one or more antibodies that specifically bind to oneor more protein products of one or more biomarkers of the invention,said nucleic acid sequences being part of expression vectors thatexpress one or more antibodies in a suitable host. In particular, suchnucleic acid sequences have promoters operably linked to the antibodies,said promoter being inducible or constitutive, and, optionally,tissue-specific.

In another aspect, a composition of the invention comprises nucleic acidsequences encoding dominant-negative polypeptides of one or proteinproducts of one or more biomarkers of the invention, said nucleic acidsequences being part of expression vectors that expressdominant-negative polypeptides in a suitable host. In particular, suchnucleic acid sequences have promoters operably linked to thedominant-negative polypeptides, said promoter being inducible orconstitutive, and, optionally, tissue-specific. In another particularembodiment, nucleic acid molecules are used in which thedominant-negative coding sequences and any other desired sequences areflanked by regions that promote homologous recombination at a desiredsite in the genome, thus providing for intrachromosomal expression ofthe dominant-negative nucleic acids (Koller and Smithies, 1989, Proc.Natl. Acad. Sci. USA 86:8932-8935; Zijlstra et al., 1989, Nature342:435-438).

Delivery of the nucleic acids into a patient may be either direct, inwhich case the patient is directly exposed to the nucleic acid ornucleic acid-carrying vectors, or indirect, in which case, cells arefirst transformed with the nucleic acids in vitro, then transplantedinto the patient. These two approaches are known, respectively, as invivo or ex vivo gene therapy.

In a specific embodiment, the nucleic acid sequence is directlyadministered in vivo, where it is expressed to produce the encodedproduct. This can be accomplished by any of numerous methods known inthe art, e.g., by constructing it as part of an appropriate nucleic acidexpression vector and administering it so that they becomeintracellular, e.g., by infection using defective or attenuatedretrovirals or other viral vectors (see U.S. Pat. No. 4,980,286), or bydirect injection of naked DNA, or by use of microparticle bombardment(e.g., a gene gun; Biolistic, Dupont), or coating with lipids orcell-surface receptors or transfecting agents, encapsulation inliposomes, microparticles, or microcapsules, or by administering them inlinkage to a peptide which is known to enter the nucleus, byadministering it in linkage to a ligand subject to receptor-mediatedendocytosis (see, e.g., Wu and Wu, 1987, J. Biol. Chem. 262:4429-4432)(which can be used to target cell types specifically expressing thereceptors), etc. In another embodiment, nucleic acid-ligand complexescan be formed in which the ligand comprises a fusogenic viral peptide todisrupt endosomes, allowing the nucleic acid to avoid lysosomaldegradation. In yet another embodiment, the nucleic acid can be targetedin vivo for cell specific uptake and expression, by targeting a specificreceptor (see, e.g., International Publication Nos. WO 92/06180 datedApr. 16, 1992 (Wu et al.); WO 92/22635 dated Dec. 23, 1992 (Wilson etal.); WO92/20316 dated Nov. 26, 1992 (Findeis et al.); WO 93/14188 datedJul. 22, 1993 (Clarke et al.), WO 93/20221 dated Oct. 14, 1993 (Young)).Alternatively, the nucleic acid can be introduced intracellularly andincorporated within host cell DNA for expression, by homologousrecombination (Koller and Smithies, 1989, Proc. Natl. Acad. Sci. USA86:8932-8935; Zijlstra et al., 1989, Nature 342:435-438).

For example, a retroviral vector can be used. These retroviral vectorshave been modified to delete retroviral sequences that are not necessaryfor packaging of the viral genome and integration into host cell DNA.The nucleic acid sequences encoding the antibodies of interest, orproteins of interest or fragments thereof to be used in gene therapy arecloned into one or more vectors, which facilitates delivery of the geneinto a patient. More detail about retroviral vectors can be found inBoesen et al., 1994, Biotherapy 6:291-302, which describes the use of aretroviral vector to deliver the mdr1 gene to hematopoietic stem cellsin order to make the stem cells more resistant to chemotherapy. Otherreferences illustrating the use of retroviral vectors in gene therapyare: Clowes et al., 1994, J. Clin. Invest. 93:644-651; Kiem et al.,1994, Blood 83:1467-1473; Salmons and Gunzberg, 1993, Human Gene Therapy4:129-141; and Grossman and Wilson, 1993, Curr. Opin. in Genetics andDevel. 3:110-114.

Adenoviruses are other viral vectors that can be used in gene therapy.Adenoviruses are especially attractive vehicles for delivering genes torespiratory epithelia. Adenoviruses naturally infect respiratoryepithelia where they cause a mild disease. Other targets foradenovirus-based delivery systems are liver, the central nervous system,endothelial cells, and muscle. Adenoviruses have the advantage of beingcapable of infecting non-dividing cells. Kozarsky and Wilson, 1993,Current Opinion in Genetics and Development 3:499-503 present a reviewof adenovirus-based gene therapy. Bout et al., 1994, Human Gene Therapy5:3-10 demonstrated the use of adenovirus vectors to transfer genes tothe respiratory epithelia of rhesus monkeys. Other instances of the useof adenoviruses in gene therapy can be found in Rosenfeld et al., 1991,Science 252:431-434; Rosenfeld et al., 1992, Cell 68:143-155;Mastrangeli et al., 1993, J. Clin. Invest. 91:225-234; PCT PublicationWO94/12649; and Wang, et al., 1995, Gene Therapy 2:775-783. In apreferred embodiment, adenovirus vectors are used.

Adeno-associated virus (AAV) has also been proposed for use in genetherapy (Walsh et al., 1993, Proc. Soc. Exp. Biol. Med. 204:289-300;U.S. Pat. No. 5,436,146).

Another approach to gene therapy involves transferring a gene to cellsin tissue culture by such methods as electroporation, lipofection,calcium phosphate mediated transfection, or viral infection. Usually,the method of transfer includes the transfer of a selectable marker tothe cells. The cells are then placed under selection to isolate thosecells that have taken up and are expressing the transferred gene. Thosecells are then delivered to a patient.

In this embodiment, the nucleic acid is introduced into a cell prior toadministration in vivo of the resulting recombinant cell. Suchintroduction can be carried out by any method known in the art,including but not limited to transfection, electroporation,microinjection, infection with a viral or bacteriophage vectorcontaining the nucleic acid sequences, cell fusion, chromosome-mediatedgene transfer, microcell-mediated gene transfer, spheroplast fusion,etc. Numerous techniques are known in the art for the introduction offoreign genes into cells (see, e.g., Loeffler and Behr, 1993, Meth.Enzymol. 217:599-618; Cohen et al., 1993, Meth. Enzymol. 217:618-644;Cline, 1985, Pharmac. Ther. 29:69-92) and may be used in accordance withthe present invention, provided that the necessary developmental andphysiological functions of the recipient cells are not disrupted. Thetechnique should provide for the stable transfer of the nucleic acid tothe cell, so that the nucleic acid is expressible by the cell andpreferably heritable and expressible by its cell progeny.

The resulting recombinant cells can be delivered to a patient by variousmethods known in the art. Recombinant blood cells (e.g., hematopoieticstem or progenitor cells) and/or intestinal cells are preferablyadministered intravenously. The amount of cells envisioned for usedepends on the desired effect, patient state, etc., and can bedetermined by one skilled in the art.

Cells into which a nucleic acid can be introduced for purposes of genetherapy encompass any desired, available cell type, and include but arenot limited to epithelial cells, endothelial cells, keratinocytes,intestinal cells, fibroblasts, muscle cells, hepatocytes; blood cellssuch as T lymphocytes, B lymphocytes, monocytes, macrophages,neutrophils, eosinophils, megakaryocytes, granulocytes; various stem orprogenitor cells, in particular hematopoietic stem or progenitor cells,e.g., as obtained from bone marrow, umbilical cord blood, peripheralblood, fetal liver, etc.

In a preferred embodiment, the cell used for gene therapy is autologousto the patient.

In one embodiment in which recombinant cells are used in gene therapy,nucleic acid sequences encoding antibodies of interest, or proteins ofinterest or fragments thereof are introduced into the cells such thatthey are expressible by the cells or their progeny, and the recombinantcells are then administered in vivo for therapeutic effect. In aspecific embodiment, stem or progenitor cells are used. Any stem and/orprogenitor cells which can be isolated and maintained in vitro canpotentially be used in accordance with this embodiment of the presentinvention (see, e.g., International Publication No. WO 94/08598, datedApr. 28, 1994; Stemple and Anderson, 1992, Cell 71:973-985; Rheinwald,1980, Meth. Cell Bio. 21A:229; and Pittelkow and Scott, 1986, MayoClinic Proc. 61:771).

Promoters that may be used to control the expression of nucleic acidsequences encoding antibodies of interest, proteins of interest orfragments thereof may be constitutive, inducible or tissue-specific.Non-limiting examples include the SV40 early promoter region (Bernoistand Chambon, 1981, Nature 290:304-310), the promoter contained in the 3′long terminal repeat of Rous sarcoma virus (Yamamoto, et al., 1980, Cell22:787-797), the herpes thymidine kinase promoter (Wagner et al., 1981,Proc. Natl. Acad. Sci. USA 78:1441-1445), the regulatory sequences ofthe metallothionein gene (Brinster et al., 1982, Nature 296:39-42);prokaryotic expression vectors such as the β-lactamase promoter(Villa-Kamaroff et al., 1978, Proc. Natl. Acad. Sci. USA 75:3727-3731),or the tac promoter (DeBoer et al., 1983, Proc. Natl. Acad. Sci. USA80:21-25); see also “Useful proteins from recombinant bacteria” inScientific American, 1980, 242:74-94; plant expression vectorscomprising the nopaline synthetase promoter region (Herrera-Estrella etal., Nature 303:209-213) or the cauliflower mosaic virus 35S RNApromoter (Gardner et al., 1981, Nucl. Acids Res. 9:2871), and thepromoter of the photosynthetic enzyme ribulose biphosphate carboxylase(Herrera-Estrella et al., 1984, Nature 310:115-120); promoter elementsfrom yeast or other fungi such as the Gal 4 promoter, the ADC (alcoholdehydrogenase) promoter, PGK (phosphoglycerol kinase) promoter, alkalinephosphatase promoter, and the following animal transcriptional controlregions, which exhibit tissue specificity and have been utilized intransgenic animals: elastase I gene control region which is active inpancreatic acinar cells (Swift et al., 1984, Cell 38:639-646; Ornitz etal., 1986, Cold Spring Harbor Symp. Quant. Biol. 50:399-409; MacDonald,1987, Hepatology 7:425-515); insulin gene control region which is activein pancreatic beta cells (Hanahan, 1985, Nature 315:115-122),immunoglobulin gene control region which is active in lymphoid cells(Grosschedl et al., 1984, Cell 38:647-658; Adames et al., 1985, Nature318:533-538; Alexander et al., 1987, Mol. Cell. Biol. 7:1436-1444),mouse mammary tumor virus control region which is active in testicular,breast, lymphoid and mast cells (Leder et al., 1986, Cell 45:485-495),albumin gene control region which is active in liver (Pinkert et al.,1987, Genes and Devel. 1:268-276), alpha-fetoprotein gene control regionwhich is active in liver (Krumlauf et al., 1985, Mol. Cell. Biol.5:1639-1648; Hammer et al., 1987, Science 235:53-58; alpha 1-antitrypsingene control region which is active in the liver (Kelsey et al., 1987,Genes and Devel. 1:161-171), beta-globin gene control region which isactive in myeloid cells (Mogram et al., 1985, Nature 315:338-340;Kollias et al., 1986, Cell 46:89-94; myelin basic protein gene controlregion which is active in oligodendrocyte cells in the brain (Readheadet al., 1987, Cell 48:703-712); myosin light chain-2 gene control regionwhich is active in skeletal muscle (Sani, 1985, Nature 314:283-286), andgonadotropic releasing hormone gene control region which is active inthe hypothalamus (Mason et al., 1986, Science 234:1372-1378).

In a specific embodiment, the nucleic acid to be introduced for purposesof gene therapy comprises an inducible promoter operably linked to thecoding region, such that expression of the nucleic acid is controllableby controlling the presence or absence of the appropriate inducer oftranscription.

(S) Pharmaceutical Compositions

Biologically active compounds identified using the methods of theinvention or a pharmaceutically acceptable salt thereof can beadministered to a patient, in some embodiments a mammal, including ahuman, having one or more colorectal pathologies. In a specificembodiment, a compound or pharmaceutically acceptable salt thereof isadministered to a patient, in some embodiments a mammal, including ahuman, having one or more colorectal pathologies. In another embodiment,a compound or a pharmaceutically acceptable salt thereof is administeredto a patient, in some embodiments a mammal, including a human, as apreventative measure against one or more colorectal pathologies. Inaccordance with these embodiments, the patient may be a child, an adultor elderly, wherein a “child” is a subject between the ages of 24 monthsof age and 18 years of age, an “adult” is a subject 18 years of age orolder, and “elderly” is a subject 65 years of age or older.

When administered to a patient, the compound or a pharmaceuticallyacceptable salt thereof is administered as component of a compositionthat optionally comprises a pharmaceutically acceptable vehicle. Thecomposition can be administered orally, or by any other convenientroute, for example, by infusion or bolus injection, by absorptionthrough epithelial or mucocutaneous linings (e.g., oral mucosa, rectal,and intestinal mucosa, etc.) and may be administered together withanother biologically active agent. Administration can be systemic orlocal. Various delivery systems are known, e.g., encapsulation inliposomes, microparticles, microcapsules, capsules, etc., and can beused to administer the compound and pharmaceutically acceptable saltsthereof.

Methods of administration include but are not limited to intradermal,intramuscular, intraperitoneal, intravenous, subcutaneous, intranasal,epidural, oral, sublingual, intranasal, intracerebral, intravaginal,transdermal, rectally, by inhalation, or topically, particularly to theears, nose, eyes, or skin. The mode of administration is left to thediscretion of the practitioner. In most instances, administration willresult in the release of the compound or a pharmaceutically acceptablesalt thereof into the bloodstream.

In specific embodiments, it may be desirable to administer the compoundor a pharmaceutically acceptable salt thereof locally. This may beachieved, for example, and not by way of limitation, by local infusionduring surgery, topical application, e.g., in conjunction with a wounddressing after surgery, by injection, by means of a catheter, by meansof a suppository, or by means of an implant, said implant being of aporous, non-porous, or gelatinous material, including membranes, such assialastic membranes, or fibers. In a specific embodiment, a compound isadministered locally to one or more sections of the gastrointestinalsystem.

In certain embodiments, it may be desirable to introduce the compound ora pharmaceutically acceptable salt thereof into the gastrointestinalsystem by any suitable route, including intraventricular, intrathecaland epidural injection. Intraventricular injection may be facilitated byan intraventricular catheter, for example, attached to a reservoir, suchas an Ommaya reservoir.

Pulmonary administration can also be employed, e.g., by use of aninhaler or nebulizer, and formulation with an aerosolizing agent, or viaperfusion in a fluorocarbon or synthetic pulmonary surfactant. Incertain embodiments, the compound and pharmaceutically acceptable saltsthereof can be formulated as a suppository, with traditional binders andvehicles such as triglycerides.

In another embodiment, the compound and pharmaceutically acceptablesalts thereof can be delivered in a vesicle, in particular a liposome(see Langer, 1990, Science 249:1527-1533; Treat et al., in Liposomes inthe Therapy of Infectious Disease and Cancer, Lopez-Berestein and Fidler(eds.), Liss, New York, pp. 353-365 (1989); Lopez-Berestein, ibid., pp.317-327; see generally ibid.).

In yet another embodiment, the compound and pharmaceutically acceptablesalts thereof can be delivered in a controlled release system (see,e.g., Goodson, in Medical Applications of Controlled Release, supra,vol. 2, pp. 115-138 (1984)). Other controlled-release systems discussedin the review by Langer, 1990, Science 249:1527-1533 may be used. In oneembodiment, a pump may be used (see Langer, supra; Sefton, 1987, CRCCrit. Ref. Biomed. Eng. 14:201; Buchwald et al., 1980, Surgery 88:507;Saudek et al., 1989, N. Engl. J. Med. 321:574). In another embodiment,polymeric materials can be used (see Medical Applications of ControlledRelease, Langer and Wise (eds.), CRC Pres., Boca Raton, Fla. (1974);Controlled Drug Bioavailability, Drug Product Design and Performance,Smolen and Ball (eds.), Wiley, New York (1984); Ranger and Peppas, 1983,J. Macromol. Sci. Rev. Macromol. Chem. 23:61; see also Levy et al.,1985, Science 228:190; During et al., 1989, Ann. Neurol. 25:351; Howardet al., 1989, J. Neurosurg. 71:105). In yet another embodiment, acontrolled-release system can be placed in proximity of a target RNA ofthe compound or a pharmaceutically acceptable salt thereof, thusrequiring only a fraction of the systemic dose.

The compounds described herein can be incorporated into pharmaceuticalcompositions suitable for administration. Such compositions typicallycomprise the active compound and a pharmaceutically acceptable carrier.As used herein the language “pharmaceutically acceptable carrier” isintended to include any and all solvents, dispersion media, coatings,antibacterial and antifungal agents, isotonic and absorption delayingagents, and the like, compatible with pharmaceutical administration. Theuse of such media and agents for pharmaceutically active substances iswell known in the art. Except insofar as any conventional media or agentis incompatible with the active compound, use thereof in thecompositions is contemplated. Supplementary active compounds can also beincorporated into the compositions.

The invention includes methods for preparing pharmaceutical compositionsfor modulating the expression or activity of a polypeptide or nucleicacid of interest. Such methods comprise formulating a pharmaceuticallyacceptable carrier with an agent that modulates expression or activityof a polypeptide or nucleic acid of interest. Such compositions canfurther include additional active agents. Thus, the invention furtherincludes methods for preparing a pharmaceutical composition byformulating a pharmaceutically acceptable carrier with an agent thatmodulates expression or activity of a polypeptide or nucleic acid ofinterest and one or more additional active compounds.

A pharmaceutical composition of the invention is formulated to becompatible with its intended route of administration. Examples of routesof administration include parenteral, e.g., intravenous, intradermal,subcutaneous, oral (e.g., inhalation), transdermal (topical),transmucosal, and rectal administration. Intravenous administration ispreferred. Solutions or suspensions used for parenteral, intradermal, orsubcutaneous application can include the following components: a sterilediluent such as water for injection, saline solution, fixed oils,polyethylene glycols, glycerine, propylene glycol or other syntheticsolvents; antibacterial agents such as benzyl alcohol or methylparabens; antioxidants such as ascorbic acid or sodium bisulfite;chelating agents such as ethylenediaminetetraacetic acid; buffers suchas acetates, citrates or phosphates and agents for the adjustment oftonicity such as sodium chloride or dextrose. pH can be adjusted withacids or bases, such as hydrochloric acid or sodium hydroxide. Theparenteral preparation can be enclosed in ampoules, disposable syringesor multiple dose vials made of glass or plastic.

Pharmaceutical compositions suitable for injectable use include sterileaqueous solutions (where water soluble) or dispersions and sterilepowders for the extemporaneous preparation of sterile injectablesolutions or dispersions. For intravenous administration, suitablecarriers include physiological saline, bacteriostatic water, CremophorEL™ (BASF; Parsippany, N.J.) or phosphate buffered saline (PBS). In allcases, the composition must be sterile and should be fluid to the extentthat easy syringability exists. It must be stable under the conditionsof manufacture and storage and must be preserved against thecontaminating action of microorganisms such as bacteria and fungi. Thecarrier can be a solvent or dispersion medium containing, for example,water, ethanol, polyol (for example, glycerol, propylene glycol, andliquid polyetheylene glycol, and the like), and suitable mixturesthereof. The proper fluidity can be maintained, for example, by the useof a coating such as lecithin, by the maintenance of the requiredparticle size in the case of dispersion and by the use of surfactants.Prevention of the action of microorganisms can be achieved by variousantibacterial and antifungal agents, for example, parabens,chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In manycases, it will be preferable to include isotonic agents, for example,sugars, polyalcohols such as mannitol, sorbitol, sodium chloride in thecomposition. Prolonged absorption of the injectable compositions can bebrought about by including in the composition an agent which delaysabsorption, for example, aluminum monostearate and gelatin.

Sterile injectable solutions can be prepared by incorporating the activecompound (e.g., a polypeptide or antibody) in the required amount in anappropriate solvent with one or a combination of ingredients enumeratedabove, as required, followed by filtered sterilization. Generally,dispersions are prepared by incorporating the active compound into asterile vehicle which contains a basic dispersion medium and therequired other ingredients from those enumerated above. In the case ofsterile powders for the preparation of sterile injectable solutions, thepreferred methods of preparation are vacuum drying and freeze-dryingwhich yields a powder of the active ingredient plus any additionaldesired ingredient from a previously sterile-filtered solution thereof.

Oral compositions generally include an inert diluent or an ediblecarrier. They can be enclosed in gelatin capsules or compressed intotablets. For the purpose of oral therapeutic administration, the activecompound can be incorporated with excipients and used in the form oftablets, troches, or capsules. Oral compositions can also be preparedusing a fluid carrier for use as a mouthwash, wherein the compound inthe fluid carrier is applied orally and swished and expectorated orswallowed.

Pharmaceutically compatible binding agents, and/or adjuvant materialscan be included as part of the composition. The tablets, pills,capsules, troches and the like can contain any of the followingingredients, or compounds of a similar nature: a binder such asmicrocrystalline cellulose, gum tragacanth or gelatin; an excipient suchas starch or lactose, a disintegrating agent such as alginic acid,Primogel, or corn starch; a lubricant such as magnesium stearate orSterotes; a glidant such as colloidal silicon dioxide; a sweeteningagent such as sucrose or saccharin; or a flavoring agent such aspeppermint, methyl salicylate, or orange flavoring.

For administration by inhalation, the compounds are delivered in theform of an aerosol spray from a pressurized container or dispenser whichcontains a suitable propellant, e.g., a gas such as carbon dioxide, or anebulizer.

Systemic administration can also be by transmucosal or transdermalmeans. For transmucosal or transdermal administration, penetrantsappropriate to the barrier to be permeated are used in the formulation.Such penetrants are generally known in the art, and include, forexample, for transmucosal administration, detergents, bile salts, andfusidic acid derivatives. Transmucosal administration can beaccomplished through the use of nasal sprays or suppositories. Fortransdermal administration, the active compounds are formulated intoointments, salves, gels, or creams as generally known in the art.

The compounds can also be prepared in the form of suppositories (e.g.,with conventional suppository bases such as cocoa butter and otherglycerides) or retention enemas for rectal delivery.

In one embodiment, the active compounds are prepared with carriers thatwill protect the compound against rapid elimination from the body, suchas a controlled release formulation, including implants andmicroencapsulated delivery systems. Biodegradable, biocompatiblepolymers can be used, such as ethylene vinyl acetate, polyanhydrides,polyglycolic acid, collagen, polyorthoesters, and polylactic acid.Methods for preparation of such formulations will be apparent to thoseskilled in the art. The materials can also be obtained commercially fromAlza Corporation and Nova Pharmaceuticals, Inc. Liposomal suspensions(including liposomes targeted to infected cells with monoclonalantibodies to viral antigens) can also be used as pharmaceuticallyacceptable carriers. These can be prepared according to methods known tothose skilled in the art, for example, as described in U.S. Pat. No.4,522,811.

It is especially advantageous to formulate oral or parenteralcompositions in dosage unit form for ease of administration anduniformity of dosage. Dosage unit form as used herein refers tophysically discrete units suited as unitary dosages for the individualto be treated; each unit containing a predetermined quantity of activecompound calculated to produce the desired therapeutic effect inassociation with the required pharmaceutical carrier. The specificationfor the dosage unit forms of the invention are dictated by and directlydependent on the unique characteristics of the active compound and theparticular therapeutic effect to be achieved, and the limitationsinherent in the art of compounding such an active compound for thetreatment of individuals.

For antibodies, the preferred dosage is 0.1 mg/kg to 100 mg/kg of bodyweight (more preferably, 0.1 to 20 mg/kg, 0.1-10 mg/kg). Generally,partially human antibodies and fully human antibodies have a longerhalf-life within the human body than other antibodies. Accordingly,lower dosages and less frequent administration is often possible.Modifications such as lipidation can be used to stabilize antibodies andto enhance uptake and tissue penetration (e.g., into thegastrointestinal system). A method for lipidation of antibodies isdescribed by Cruikshank et al. (1997, J. Acquired Immune DeficiencySyndromes and Human Retrovirology 14:193).

In a specific embodiment, an effective amount of protein or polypeptide(i.e., an effective dosage) ranges from about 0.001 to 30 mg/kg bodyweight, preferably about 0.01 to 25 mg/kg body weight, more preferablyabout 0.1 to 20 mg/kg body weight, and even more preferably about 0.1 to1.0 mg/kg, 1 to 10 mg/kg, 2 to 9 mg/kg, 3 to 8 mg/kg, 4 to 7 mg/kg, or 5to 6 mg/kg body weight.

The skilled artisan will appreciate that certain factors may influencethe dosage required to effectively treat an individual, including butnot limited to the severity of the disease or disorder, previoustreatments, the general health and/or age of the individual, and otherdiseases present. Moreover, treatment of a individual with atherapeutically effective amount of a protein, polypeptide, or antibodycan include a single treatment or, preferably, can include a series oftreatments.

In addition to those compounds described above, the present inventionencompasses the use of small molecules that modulate expression oractivity of a nucleic acid or polypeptide of interest. Non-limitingexamples of small molecules include peptides, peptidomimetics, aminoacids, amino acid analogs, polynucleotides, polynucleotide analogs,nucleotides, nucleotide analogs, organic or inorganic compounds (i.e.,including heteroorganic and organometallic compounds) having a molecularweight less than about 10,000 grams per mole, organic or inorganiccompounds having a molecular weight less than about 5,000 grams permole, organic or inorganic compounds having a molecular weight less thanabout 1,000 grams per mole, organic or inorganic compounds having amolecular weight less than about 500 grams per mole, and salts, esters,and other pharmaceutically acceptable forms of such compounds.

It is understood that appropriate doses of small molecule agents dependsupon a number of factors within the ken of the ordinarily skilledphysician, veterinarian, or researcher. The dose(s) of the smallmolecule will vary, for example, depending upon the identity, size, andcondition of the individual or sample being treated, further dependingupon the route by which the composition is to be administered, ifapplicable, and the effect which the practitioner desires the smallmolecule to have upon the nucleic acid or polypeptide of the invention.Exemplary doses include milligram or microgram amounts of the smallmolecule per kilogram of individual or sample weight (e.g., about 1microgram per kilogram to about 500 milligrams per kilogram, about 100micrograms per kilogram to about 5 milligrams per kilogram, or about 1microgram per kilogram to about 50 micrograms per kilogram). It isfurthermore understood that appropriate doses of a small molecule dependupon the potency of the small molecule with respect to the expression oractivity to be modulated. Such appropriate doses may be determined usingthe assays described herein. When one or more of these small moleculesis to be administered to an individual (e.g., a human) in order tomodulate expression or activity of a polypeptide or nucleic acid of theinvention, a physician, veterinarian, or researcher may, for example,prescribe a relatively low dose at first, subsequently increasing thedose until an appropriate response is obtained. In addition, it isunderstood that the specific dose level for any particular animalsubject will depend upon a variety of factors including the activity ofthe specific compound employed, the age, body weight, general health,gender, and diet of the subject, the time of administration, the routeof administration, the rate of excretion, any drug combination, and thedegree of expression or activity to be modulated.

The pharmaceutical compositions can be included in a container, pack, ordispenser together with instructions for administration.

(T) Kits

The present invention provides kits for measuring the expression of theprotein and RNA products of at least 1, at least 2, at least 3, at least4, at least 5, at least 6, at least 7, at least 8, at least 9, at least10, at least 12, at least 15, or all or any combination of thebiomarkers of the invention. Such kits comprise materials and reagentsrequired for measuring the expression of such protein and RNA products.In specific embodiments, the kits may further comprise one or moreadditional reagents employed in the various methods, such as: (1)reagents for purifying RNA from blood; (2) biomarker specific primersets for generating test nucleic acids; (3) dNTPs and/or rNTPs (eitherpremixed or separate), optionally with one or more uniquely labeleddNTPs and/or rNTPs (e.g., biotinylated or Cy3 or Cy5 tagged dNTPs); (4)post synthesis labeling reagents, such as chemically active derivativesof fluorescent dyes; (5) enzymes, such as reverse transcriptases, DNApolymerases, and the like; (6) various buffer mediums, e.g.,hybridization, washing and/or enzymatic buffers; (7) labeled probepurification reagents and components, like spin columns, etc.; and (8)protein purification reagents; (9) signal generation and detectionreagents, e.g., streptavidin-alkaline phosphatase conjugate,chemifluorescent or chemiluminescent substrate, and the like. Inparticular embodiments, the kits comprise prelabeled quality controlledprotein and or RNA isolated from a sample (e.g., blood) or synthesizedfor use as a control. In some embodiments kits can include acomputer-readable medium which has a formula which uses datarepresenting a level of products of at least one biomarker andgenerating an indication of the probability that a test subject has oneor more colorectal pathologies including one or more polyps or one ormore subtypes of polyps. The formula of the computer-readable medium canbe generated by using the methods outlined in section (G).

In some embodiments, the kits are PCR kits or Real time PCR kits and/orQRT-PCR kits. In other embodiments, the kits are nucleic acid arrays andprotein arrays. Such kits according to the subject invention will atleast comprise an array having associated protein or nucleic acidmembers of the invention and packaging means therefore. Alternativelythe protein or nucleic acid members of the invention may be prepackagedonto an array.

In one embodiment, the QRT-PCR kit includes the following: (a) two ormore biomarker specific primer sets, each set used to amplify abiomarker within the combination of biomarkers of the invention; (b)buffers and enzymes including a reverse transcripase; (c) one or morethermos table polymerases; and (d) Sybr® Green. In another embodiment,the kit of the invention can include (a) a reference control RNA and/or(b) a spiked control RNA. In another embodiment, the kit also includes acomputer readable medium which has a formula which uses datarepresenting a level of products of at least one biomarker andgenerating an indication of the probability that a test subject has oneor more colorectal pathologies including one or more polyps or one ormore subtypes of polyps. The formula of the computer-readable medium canbe generated by using the methods outlined in section (G).

The invention provides kits that are useful for testing, detecting,screening, diagnosing, monitoring and prognosing one or more colorectalpathologies including one or more polyps or one or more subtypes ofpolyps. For example, in a particular embodiment of the invention a kitis comprised a forward and reverse primer wherein the forward andreverse primer are designed to quantitate expression of all of thespecies of mRNA corresponding to a single distinct biomarker, where eachof the distinct biomarkers is selected from the group identified inTables 1, 2, 11, or 12. In certain embodiments, at least one of theprimers of a primer set is designed to span an exon junction of aspecies of mRNA.

The invention includes kits that are useful for testing, detecting,screening, diagnosing, monitoring and prognosing one or more colorectalpathologies including one or more types or subtypes of polyps based uponthe expression levels of protein or RNA products of at least 1, at least2, at least 3, at least 4, at least 5, at least 6, at least 7, at least8, or all or any combination of the biomarkers of the invention in asample.

The invention includes kits useful for monitoring the efficacy of one ormore therapies that an individual is undergoing based upon theexpression of a protein or RNA products of at least 1, at least 2, atleast 3, at least 4, at least 5, at least 6, at least 7, at least 8, orall or any combination of the biomarkers of the invention in a sample.

The invention includes kits using for determining whether an individualwill be responsive to a therapy based upon the expression of a proteinor RNA products of at least 1, at least 2, at least 3, at least 4, atleast 5, at least 6, at least 7, at least 8, or all or any combinationof the biomarkers of the invention in a sample.

The invention includes kits for measuring the expression of a RNAproducts of at least 1, at least 2, at least 3, at least 4, at least 5,at least 6, at least 7, at least 8, or all or any combination of thebiomarkers of the invention in a sample. In a specific embodiment, suchkits comprise materials and reagents that are necessary for measuringthe expression of a RNA products of a biomarker of the invention. Forexample, a microarray or QRT-PCR kit may be produced for detecting oneor more colon pathologies including polyps or one or more subtypes ofpolyps and contain only those reagents and materials necessary formeasuring the levels of RNA products of at least 1, at least 2, at least3, at least 4, at least 5, at least 6, at least 7, at least 8, or all orany combination of the biomarkers of the invention. Alternatively, insome embodiments, the kits can comprise materials and reagents that arenot limited to those required to measure the levels of RNA products of1, 2, 3, 4, 5, 6, 7, 8 or all or any combination of the biomarkers ofthe invention. For example, a microarray kit may contain reagents andmaterials necessary for measuring the levels of RNA products notnecessarily associated with or indicative of one or more colorectalpathologies, in addition to reagents and materials necessary formeasuring the levels of the RNA products of at least 1, at least 2, atleast 3, at least 4, at least 5, at least 6, at least 7, at least 8, orall or any combination of the biomarkers of the invention. In a specificembodiment, a microarray or QRT-PCR kit contains reagents and materialsnecessary for measuring the levels of RNA products of at least 1, atleast 2, at least 3, at least 4, at least 5, at least 6, at least 7, atleast 8, or all or any combination of the biomarkers of the invention,and 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70,75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 300, 350, 400,450, or more genes other than the biomarkers of the invention, or 1-10,1-100, 1-150, 1-200, 1-300, 1-400, 1-500, 1-1000, 25-100, 25-200,25-300, 25-400, 25-500, 25-1000, 100-150, 100-200, 100-300, 100-400,100-500, 100-1000, 500-1000 other genes than the biomarkers of theinvention.

For nucleic acid micoarray kits, the kits generally comprise probesattached or localized to a support surface. The probes may be labeledwith a detectable label. In a specific embodiment, the probes arespecific for an exon(s), an intron(s), an exon junction(s), or anexon-intron junction(s)), of RNA products of 1, 2, 3, 4, 5, 6, 7, 8, allor any combination of the biomarkers of the invention. The microarraykits may comprise instructions for performing the assay and methods forinterpreting and analyzing the data resulting from the performance ofthe assay. In a specific embodiment, the kits comprise instructions fordetecting or diagnosing one or more colorectal pathologies including oneor more polyps or one or more subtypes of polyps. The kits may alsocomprise hybridization reagents and/or reagents necessary for detectinga signal produced when a probe hybridizes to a target nucleic acidsequence. Generally, the materials and reagents for the microarray kitsare in one or more containers. Each component of the kit is generally inits own a suitable container. In another embodiment, the kit alsoincludes a computer readable medium which has a formula which uses datarepresenting a level of products of at least one biomarker andgenerating an indication of the probability that a test subject has oneor more colorectal pathologies or a subtype of colorectal pathologyincluding a polyp or one or more subtypes of polyps. The formula of thecomputer-readable medium can be generated by using the methods outlinedin section (G).

For QRT-PCR kits, the kits generally comprise pre-selected primersspecific for particular RNA products (e.g., an exon(s), an intron(s), anexon junction(s), and an exon-intron junction(s)) of 1, 2, 3, 4, 5, 6,7, 8, or all or any combination of the biomarkers of the invention. TheQRT-PCR kits may also comprise enzymes suitable for reverse transcribingand/or amplifying nucleic acids (e.g., polymerases such as Taq), anddeoxynucleotides and buffers needed for the reaction mixture for reversetranscription and amplification. The QRT-PCR kits may also compriseprobes specific for RNA products of 1, 2, 3, 4, 5, 6, 7, 8, or all orany combination of the biomarkers of the invention. The probes may ormay not be labeled with a detectable label (e.g., a fluorescent label).Each component of the QRT-PCR kit is generally in its own suitablecontainer. In another embodiment, the kit also includes a computerreadable medium which has a formula which uses data representing a levelof products of at least one biomarker and generating an indication ofthe probability that a test subject has one or more colorectalpathologies or a subtype of colorectal pathology including a polyp orone or more subtypes of polyps. The formula of the computer-readablemedium can be generated by using the methods outlined in section (G).

Thus, these kits generally comprise distinct containers suitable foreach individual reagent, enzyme, primer and probe. Further, the QRT-PCRkits may comprise instructions for performing the assay and methods forinterpreting and analyzing the data resulting from the performance ofthe assay. In a specific embodiment, the kits contain instructions fordiagnosing or detecting one or more colorectal pathologies including oneor more polyps or one or more subtypes of polyps.

In a specific embodiment, the kit is a QRT-PCR kit. Such a kit maycomprise a 96 well plate and reagents and materials necessary for SYBRGreen detection. The kit may comprise reagents and materials so thatbeta-actin can be used to normalize the results. The kit may alsocomprise controls such as water, phospate buffered saline, and phage MS2RNA. Further, the kit may comprise instructions for performing the assayand methods for interpreting and analyzing the date resulting from theperformance of the assay. In a specific embodiment, the instructionsstate that the level of a RNA products of 1, 2, 3, 4, 5, 6, 7, 8, all orany combination of the biomarkers of the invention should be examined attwo concentrations that differ by, e.g., 5 fold to 10-fold.

For antibody based kits, the kit can comprise, for example: (1) a firstantibody (which may or may not be attached to a support) which binds toprotein of interest (e.g., a protein products of 1, 2, 3, 4, 5, 6, 7, 8,all or any combination of the biomarkers of the invention); and,optionally, (2) a second, different antibody which binds to either theprotein, or the first antibody and is conjugated to a detectable label(e.g., a fluorescent label, radioactive isotope or enzyme). Theantibody-based kits may also comprise beads for conducting animmunoprecipitation. Each component of the antibody-based kits isgenerally in its own suitable container. Thus, these kits generallycomprise distinct containers suitable for each antibody. Further, theantibody-based kits may comprise instructions for performing the assayand methods for interpreting and analyzing the data resulting from theperformance of the assay. In a specific embodiment, the kits containinstructions for diagnosing or detecting one or more colorectalpathologies including one or more polyps or one or more subtypes ofpolyps. In another embodiment, the kit contains instructions forapplying the data to a formula in the form of a computer readable mediumwhich contains said instructions. Said computer readable medium can alsocontain instructions for interpreting the analyzing the data resultingfrom the performance of the assay.

(U) SNPs

A Single Nucleotide Polymorphism (SNP) is a single nucleotide variationat a specific location in the genome of different individuals. SNPs arefound in both coding and non-coding regions of genomic DNA. In spite ofthe paucity of scorable phenotypes, SNPs are found in large numbersthroughout the human genome (Cooper et al., Hum Genet 69:201-205, 1985).SNPs are stable genetic variations frequently found in genes, andcontribute to the wide range of phenotypic variations found inorganisms. Single nucleotide polymorphisms (SNPs) can be of predictivevalue in identifying many genetic diseases, as well as phenotypiccharacteristics. It is known for example that certain SNPs result indisease-causing mutations such as the SNP correlated with heritablebreast cancer (Cannon-Albright and Skolnick, Semin. Oncol. 23:1-5,1996).

A SNP may be identified in the DNA of an organism by a number of methodswell known to those of skill in the art, including but not limited toidentifying the SNP by DNA sequencing, by amplifying a PCR product andsequencing the PCR product, by Oligonucleotide Ligation Assay (OLA), byDoublecode OLA, by Single Base Extension Assay, by allele specificprimer extension, or by mismatch hybridization.

The instant invention offers a more focused and efficient method ofscreening SNPs to identify those SNPs which are specifically associatedwith one or more colorectal pathologies by having identified a selectionof genes which are differentially expressed in blood from individualshaving one or more colorectal pathologies as compared with individualsnot having said one or more colorectal pathologies. In one aspect of theinvention, a selection of SNPs to be screened are those SNPs found inthe genes listed in Tables 2 and 6. In another aspect of the invention,novel SNPs can be identified in the disease-associated biomarkers usingthose methods listed above.

In particular, this invention focuses on methods for identifying thoseSNPs which are associated with one or more colorectal pathologies byscreening only those SNPs in the biomarkers identified herein. ThoseSNPs which are identified using the methods disclosed herein will beconvenient diagnostic markers.

More specifically a SNP is considered to be a polyp associated SNP, ifthose individuals having one or more colorectal pathologies have adifferent polymorphism at the SNP locus than those individuals nothaving the one or more colorectal pathologies. Further, a particular SNPis considered to be diagnostic for one or more colorectal pathologies ifa particular polymorphism of the SNP is found to present at astatistically significant higher frequency in those individuals havingone or more colorectal pathologies than in those individuals not havingthe one or more colorectal pathologies. Indices of statisticalsignificance include p<0.05, p<0.001, p<0.01, and p<0.10. This inventionincludes methods of determining the diagnostic value of SNPs withrespect to diagnosing or detecting one or more colorectal pathologies.

As would be understood, a preferred sample is blood, but these methodsencompass any samples from which DNA can be obtained includingepithelial cells, buccal cells, hair, saliva, tissue cells and the like.There are a variety of available methods for obtaining and storingtissue and/or blood samples. These alternatives allow tissue and bloodsamples to be stored and transported in a form suitable for the recoveryof genomic DNA from the samples for genotype analysis. DNA samples canbe collected and stored on a variety of solid mediums, includingWhatmann paper, Guthrie cards, tubes, swabs, filter paper, slides, orother containers. When whole blood is collected on filter paper, forexample, it can be dried and stored at room temperature.

The blood sample may be any one of various types of blood samples,including, for example, a sample of serum-depleted blood, a sample oferythrocyte-depleted blood, a sample of serum-depleted anderythrocyte-depleted blood, a sample of lysed blood, a blood samplewhich has not been fractionated into cell types and a sample ofunfractionated cells of a lysed blood sample. Examples of blood samplesare described in Example 1 of the Examples section below.

In another aspect of the invention, polyp associated SNPs can beidentified from RNA transcripts of the polyp biomarker genes, listed inTables 2 and 6, instead of from genomic DNA. In one embodiment, RNA isisolated from a sample such as blood, from individuals with and withoutthe given disease or disorder, and transcripts encoded by these polypbiomarker genes are reversed transcribed into cDNA. The cDNA isamplified and analyzed to determine the presence of SNPs in the polypbiomarker genes. A polyp associated SNP, can be identified by thencomparing the distribution of each of the SNPs identified in the polypassociated biomarker gene(s) differentially expressed in thoseindividuals having one or more colorectal pathologies and individualswho do not have one or more colorectal pathologies. In a furthervariation of this embodiment, instead analyzing cDNA for the presence ofSNPs, the RNA transcripts of the disease specific biomarker genes, ortheir amplified products, are analyzed for the presence of SNPs.

Analysis of genomic DNA comprising the polyp biomarker genes has thepotential to identify SNPs in the coding region as well as in regulatoryregions, the latter which may contribute to the change in expressionlevels of the gene. Analysis of cDNA encoded SNPs has the potential toidentify only SNPs in the coding region of the polyp biomarker genes,which may be instrumental in deciphering protein based mechanisms ofpolyp formation. Methods of analyzing cDNA encoded SNPs can be carriedout by analyzing the cDNA generated in the reverse transcription PCRreactions described herein that are used to identify the level of thebiomarker in samples from patients and non patients.

A polyp associated SNP may be identified in the DNA of the polypbiomarker genes by a number of methods well known to those of skill inthe art (see for example U.S. Pat. Nos. 6,221,592 and 5,679,524),including but not limited to identifying the SNP by PCR or DNAamplification, Oligonucleotide Ligation Assay (OLA) (Landegren et al.,Science 241:1077, 1988), Doublecode OLA, mismatch hybridization, massspectrometry, Single Base Extension Assay, (U.S. Pat. No. 6,638,722),RFLP detection based on allele-specific restriction-endonucleasecleavage (Kan and Dozy, Lancet ii: 910-912, 1978), hybridization withallele-specific oligonucleotide probes (Wallace et al., Nucl Acids Res6:3543-3557, 1978), including immobilized oligonucleotides (Saiki etal., Proc Natl Acad Sci USA 86:6230-6234, 1989) or oligonucleotidearrays (Maskos and Southern, Nucl Acids Res 21:2269-2270, 1993),allele-specific PCR (Newton et al., Nucl Acids Res 17:2503-16, 1989),mismatch-repair detection (MRD) (Faham and Cox, Genome Res 5:474-482,1995), binding of MutS protein (Wagner et al., Nucl Acids Res23:3944-3948, 1995), single-strand-conformation-polymorphism detection(Orita et al., Genomics 5:874-879, 1983), RNAase cleavage at mismatchedbase-pairs (Myers et al., Science 230:1242, 1985), chemical (Cotton etal., Proc Natl Acad Sci USA 85:4397-4401, 1988) or enzymatic (Youil etal., Proc Natl Acad Sci USA 92:87-91, 1995) cleavage of heteroduplexDNA, methods based on allele specific primer extension (Syvanen et al.,Genomics 8:684-692, 1990), genetic bit analysis (GBA) (Nikiforov et al.,Nuci Acids Res 22:4167-4175, 1994), and radioactive and/or fluorescentDNA sequencing using standard procedures well known in the art.

The instant methods of screening a subset of SNPs to identify polypassociated SNPs in polyp biomarker genes also encompass non-PCR methodsof DNA. These methods include ligase chain reaction (“LCR”), disclosedin European Patent Application No. 320,308, Qbeta Replicase, describedin PCT Patent Application No. PCT/US87/00880, isothermal amplificationmethods, Walker et al. (Nucleic Acids Res 20(7):1691-6, 1992), StrandDisplacement Amplification (SDA) described in U.S. Pat. Nos. 5,712,124,5,648,211 and 5,455,166, Cyclic Probe Reaction, Transcription-BasedAmplification, including nucleic acid sequence based amplification(NASBA) and 3SR, Kwoh et al., Proc Natl Acad Sci USA, 86:1173-77, 1989;PCT Patent Application WO 88/10315 et al., 1989, other amplificationmethods, as described in British Patent Application No. GB 2,202,328,and in PCT Patent Application No. PCT/US89/01025, Davey et al., EuropeanPatent Application No. 329,822, Miller et al., PCT Patent Application WO89/06700, “race and “one-sided PCR TM.” described in Frohman, In: PCRProtocols: A Guide To Methods And Applications, Academic Press, N.Y.,1990, methods based on ligation of two (or more) oligonucleotides in thepresence of nucleic acid having the sequence of the resulting“di-oligonucleotide, described in Wu et al., Genomics 4:560-569, 1989.

While it is generally contemplated that the polymerase employed will bethermostable, non-thermostable polymerases may also be employed in thecontext of the present disclosure. Exemplary polymerases and nucleicacid modifying enzymes that may be used in the context of the disclosureinclude the thermostable DNA Polymerases of OmniBase Sequencing Enzyme,Pfu DNA Polymerase, Taq DNA Polymerase, Taq DNA Polymerase, SequencingGrade, TaqBead Hot Start Polymerase, AmpliTaq Gold, Vent DNA Polymerase,Tub DNA Polymerase, TaqPlus DNA Polymerase, Tfl DNA Polymerase, Tli DNAPolymerase, Tth DNA Polymerase; the DNA Polymerases of DNA Polymerase I,Klenow Fragment, Exonuclease Minus, DNA Polymerase I, DNA Polymerase ILarge (Klenow) Fragment, Terminal Deoxynucleotidyl Transferase, T7 DNAPolymerase, T4 DNA Polymerase; the Reverse trancriptases of AMV ReverseTranscriptase and M-MLV Reverse Transcriptase; T4 DNA ligase and T4polynucleotide kinase.

Recognition moieties incorporated into primers, incorporated into theamplified product during amplification, or attached to probes are usefulin the identification of the amplified molecules. A number of differentlabels may be used for this purpose such as, for example: fluorophores,chromophores, radio-isotopes, enzymatic tags, antibodies,chemiluminescence, electroluminescence, affinity labels, etc. One ofskill in the art will recognize that these and other fluorophores notmentioned herein can also be used with success in this disclosure.Examples of affinity labels include but are not limited to thefollowing: an antibody, an antibody fragment, a receptor protein, ahormone, biotin, DNP, or any polypeptide/protein molecule that binds toan affinity label and may be used for separation of the amplified gene.Examples of enzyme tags include enzymes such as urease, alkalinephosphatase, or peroxidase. Additionally, colorimetric indicatorsubstrates can be employed to provide a detection means visible to thehuman eye or spectrophotometrically, to identify specific hybridizationwith complementary nucleic acid-containing samples. All these examplesare generally known in the art and the skilled artisan will recognizethat the present disclosure is not limited to the examples describedabove. The following fluorophores are specifically contemplated to beuseful in the present disclosure: Alexa 350, Alexa 430, AMCA, BODIPY630/650, BODIPY 650/665, BODIPY-FL, BODIPY-R6G, BODIPY-TMR, BODIPY-TRX,Cascade Blue, Cy2, Cy3, Cy5, 6-FAM, Fluorescein, HEX, 6-JOE, OregonGreen 488, Oregon Green 500, Oregon Green 514, Pacific Blue, REG,Rhodamine Green, Rhodamine Red, ROX, TAMRA, TET, Tetramethylrhodamine,and Texas Red.

In the context of the present disclosure, it is specificallycontemplated that the DNA amplification products of the disclosedmethods may be analyzed using DNA chips or microarrays in order todetect SNPs. The amplified DNA products may then be passed over a DNAchip or microarray encompassing oligonucleotide or polynucleotideprobes. The ability or inability of the amplified DNA to hybridize tothe microarray or DNA chip will facilitate the characterization of theSNPs present in the biomiarker genes encoding the transcripts present inthe sample.

The following non-limiting examples are illustrative of the presentinvention.

EXAMPLES Example 1 RNA Isolation from Unfractionated Whole Blood (a)Centrifuged Lysed Blood (Serum Reduced, Erythrocyte Reduced BloodSample)

Ten ml of peripheral whole blood was collected in EDTA Vacutainer tubes(Becton Dickinson, Franklin Lakes, N.J.) and stored on ice untilprocessing (within 6 hours). Upon centrifugation, blood samplesseparated into plasma (including the buffy coat) and red blood celllayers. The plasma was removed and a hypotonic buffer (1.6 mM EDTA, 10mM KHCO3, 153 mM NH4Cl, pH 7.4) was added to lyse the red blood cells ata 3:1 volume ratio. The mixture was centrifuged to yield a cell pellet,which was dissolved and homogenized into 1.0 ml of TRIzol® Reagent(Invitrogen Corp., Carlsbad, Calif.) and 0.2 ml of chloroform accordingto the manufacture's instructions. After centrifugation, isopropanol wasadded to the aqueous phase at a 1:1 ratio and allowed to precipitate at−20° C. Subsequent centrifugation yielded an RNA pellet that wasresuspended in water for experimental use. RNA quality was assessed onAgilent 2100 Bioanalyzer RNA 6000 Nano Chips as specified by themanufacturer, and RNA quantity was determined by absorbance at 260 nm ina Beckman-Coulter DU640 Spectrophotometer.

(b) Lysed Blood

10 ml whole blood is obtained in a Vacutainer or any smaller volumecontainer desired. Lysis Buffer is added directly to the blood sample(where the blood sample does not have the serum removed) in a ratio of 3parts Lysis Buffer to 1 part blood (Lysis Buffer (1 L) 0.6 g EDTA; 1.0 gKHCO2, 8.2 g NH4Cl adjusted to pH 7.4 (using NaOH)). Sample is mixed andplaced on ice for 5-10 minutes until transparent. Lysed sample iscentrifuged at 1000 rpm for 10 minutes at 4° C., and supernatant isaspirated. Pellet is resuspended in 5 ml Lysis Buffer, and centrifugedagain at 1000 rpm for 10 minutes at 4° C. Pelleted cells are homogenizedusing TRIzol (GIBCO/BRL) in a ratio of approximately 6 ml of TRIzol forevery 10 ml of the original blood sample and vortexed well. Samples areleft for 5 minutes at room temperature. RNA is extracted using 1.2 ml ofchloroform per 1 ml of TRIzol. Sample is centrifuged at 12,000×g for 5minutes at 4° C. and upper layer is collected. To upper layer,isopropanol is added in ratio of 0.5 ml per 1 ml of TRIzol. Sample isleft overnight at −20° C. or for one hour at −20° C. RNA is pelleted inaccordance with known methods, RNA pellet air dried, and pelletresuspended in DEPC treated ddH₂O. RNA samples can also be stored in 75%ethanol where the samples are stable at room temperature fortransportation.

(c) from Serum Reduced Whole Blood

10 ml whole blood is obtained in a Vacutainer and spun at 2,000 rpm (800g) for 5 min at 4° C. and the plasma layer removed. The remaining bloodsample is homogenized using TRIzol (GIBCO/BRL) in a ratio ofapproximately 6 ml of TRIzol for every 10 ml of the original bloodsample and vortexed well. Samples are left for 5 minutes at roomtemperature. RNA is extracted using 1.2 ml of chloroform per 1 ml ofTRIzol. Sample is centrifuged at 12,000×g for 5 minutes at 4° C. andupper layer is collected. To upper layer, isopropanol is added in ratioof 0.5 ml per 1 ml of TRIzol. Sample is left overnight at −20° C. or forone hour at −20° C. RNA is pelleted in accordance with known methods,RNA pellet air dried, and pellet resuspended in DEPC treated ddH₂O. RNAsamples can also be stored in 75% ethanol where the samples are stableat room temperature for transportation.

d) From Whole Blood

10 ml whole blood is obtained in a Vacutainer and the sample ishomogenized using TRIzol (GIBCO/BRL) in a ratio of approximately 6 ml ofTRIzol for every 10 ml of the blood sample and vortexed well. Samplesare left for 5 minutes at room temperature. RNA is extracted using 1.2ml of chloroform per 1 ml of TRIzol. Sample is centrifuged at 12,000×gfor 5 minutes at 4° C. and upper layer is collected. To upper layer,isopropanol is added in ratio of 0.5 ml per 1 ml of TRIzol. Sample isleft overnight at −20° C. or for one hour at −20° C. RNA is pelleted inaccordance with known methods, RNA pellet air dried, and pelletresuspended in DEPC treated ddH₂O. RNA samples can also be stored in 75%ethanol where the samples are stable at room temperature fortransportation.

(e) from Whole Blood Using PAXgene™

2.5 ml whole blood is collected into PAXgene™ Blood RNA Tubes andprocessed in accordance with the instructions of the PAXgene™ Blood RNAKit protocol. In brief after storing the blood in the PAXgene™ tube forat least 2 hours, the blood sample is centriguted and the supernatantdiscarded. To the remaining sample, 360 μl of the supplied Buffer BR1 isadded and the sample is pipetted into the spin column and centrifuged anthen washed with numerous wash steps and finally eluted and stored.

(f) from Whole Blood Using PAXgene™ and Subsequent Globin Reduction

RNA isolated from PAXGene™ as noted in (d) above is subsequently treatedto selectively remove globin mRNA as is described in Affymetrix®technical note entitled “Globin Reduction Protocol.” Oligonucleotidesspecific for the alpha 1, alpha 2 and beta globin species are incubatedwith an oligonucleotide hybridization buffer and RNAse H used tospecifically target degredation of the globin mRNA and the cRNA clean upcolumn from Affymetrix used to remove the globin mRNA.

Example 2 Target Nucleic Acid Preparation and Hybridization

(a) Genes which are Differentially Expressed with the Presence of One orMore Colorectal Pathologies

Total RNA (5 μg) was labeled and hybridized onto Affymetrix U133Plus 2.0GeneChips (Affymetrix; Santa Clara, Calif.) along with other similarlyprepared samples from individuals having or not having polyps andhybridized according to the manufacturer's instructions. Briefly, thefive μg total RNA was used for cDNA synthesis with GeneChip T7-Oligo(dT)primer provided in the promoter primer kit (Affymetrix, P/N 900375).cDNA was cleaned up with cDNA Cleanup Spin Column and then subjected tosynthesis of Biotin-Labeled cRNA with Enzo®BioArray™High Yield™ RNATranscript Labeling Kit. The labeled cRNA was further purified using theIVT cTNA Cleanup Spin Column and quantified using spectrophotometer withabsorbance at 260 nm. 20 μg cRNA was then added into the hybridizationcocktail and the cocktail was applied to the probe array cartridge.After approximately 16 hours hybridization, the array was washed withAffymetrix fluidics station 400. The array was then scanned withAffymetrix® GeneChip® Scanner.

Hybridization signals were scaled in the Affymetrix GCOS software(version 1.1.1), using a scaling factor determined by adjusting theglobal trimmed mean signal intensity value to 500 for each array, andimported into GeneSpring version 7.2 (Silicon Genetics; Redwood City,Calif.). Signal intensities were then centered to the 50^(th) percentileof each chip, and for each individual gene, to the median intensity ofthe whole sample set. Only genes called present or marginal by the GCOSsoftware in at least 80% of each group of samples were used for furtheranalysis. Differentially expressed genes were identified using 1) thenon-parametric Wilcoxon-Mann-Whitney non-parametric test (P<0.05), 2)parametric t test (P<0.05), and/or the 3) unsupervised analysis method(14). In the un-supervised analysis, the signal-intensity filtered geneswere used to select genes with at least 2-fold change (up or down) inexpression level, away from the mean, in at least 15% of the samples.Hierarchical cluster analysis was performed on each comparison to assesscorrelation analysis using Spearman correlation among samples for eachidentified gene set as the similarity measure with average centroidlinkage in GeneSpring v6.0. Results from numerous experiments wereanalyzed and a compiled list of results provided in Table 1.

(b) Genes which are Differentially Expressed with the Presence of HighRisk Polyps

Total RNA was isolated from centrifuged lysed blood (ie serum reduced,erythrocyte reduced blood) as described in Example 1 from patientsdiagnosed with having high risk polyps. 1 μg of Oligo-dT primers wereannealed to 10 μg of total RNA for each individual tested in a totalvolume of 10 μl, by heating to 70° C. for 10 min, and cooled on ice.Individuals were diagnosed as having one or more of the high risk polypsubtypes (colorectal pathology during a colonoscopy identified the polypidentified as one or more of the following types: Tubulovillous Adenoma;Villous Adenoma; Cancer; High Grade Dysplasia; and Tubular Adenomawherein the diameter of the Tubular Adenoma is greater than 10 mm).Procedures were otherwise carried out as described in Example 2(a)above.

Results from numerous experiments were analyzed and a compiled list ofresults provided in Table 11.

Example 3 Quantitative Real Time PCR (QRT-PCR)

QRT-PCR was performed on a selection of the genes in Table 1 which aredisclosed in Table 2. QRT-PCR was done using either the SYBR® Green Kitfrom Qiagen (Product Number 204143) and/or using Applied Biosystems PCRreagent kits (Cat 4334973). Amplicons were detected in real time using aPrism 7500 instrument (Applied Biosystems).

Reverse transcription was first performed using the High-Capacity cDNAArchive Kit from Applied Biosystems (Product number 4322171) andfollowing the protocol utilized therein.

More specifically purified RNA as described previously herein wasincubated with reverse transcriptase buffer, dNTPs, random primers andreverse transcriptase and incubated for 25° C. for 10 minutes andsubsequently for 37° C. for two hours and the resulting mixture utilizedas the starting product for quantitative PCR.

cDNA resulting from reverse transcription was incubated with theQuantiTect SYBR® Green PCR Master Mix as provided and no adjustmentswere made for magnesium concentration. Uracil-N-Glycosylase was notadded. 5 μM of both forward primer and reverse primer specific to thegenes of the invention were added and the reaction was incubated andmonitored in accordance with the standard protocol utilizing the ABIPRISM 7700/ABI GeneAmp 5700/iCycler/DNA Engine Opticon. Primers utilizedare shown in Table 6. Other examples of primers which can be used aredisclosed in Table 4. Forward and reverse primers for the candidatebiomarkers were designed using “PrimerQuest,” a tool that is availablefrom Integrated DNA Technologies, Coralville, Iowa Table 6 lists theprimer sets for eight of the genes of Table 2, namely, MBTPS1(membrane-bound transcription factor protease site 1), MGC45871, MKLN1(muskelin 1), NIPBL (Nipped-B homolog (Drosophila)), APEH (acylpeptidehydrolase), FLJ23091, MGC40157, and PPP1R2 (protein phosphatase 1,regulatory (inhibitor) subunit 2). Serial dilution measurements for thetarget genes and a housekeeping gene (beta-actin, ACTB) were assayed, toensure that the values were within linear range and the amplificationefficiency was approximately equal for the target and ACTB. ACTB wasselected as a housekeeping gene because no statistical significantdifferences were observed between control and disease group in thisstudy (data not shown). The melting temperature [Tm] in thermaldissociation, and examination on agarose gels provided confirmation ofspecific PCR amplification and the lack of primer-dimer formation ineach reaction well.

For individual target gene analysis, changes in Ct value between eachgene and the ACTB house-keeping was calculated as ΔCt=Ct (targetgene)−Ct (house-keeping gene).

Example 4

TaqMan® QRT PCR can also be performed using the QuantiTect™ Probe RT-PCRsystem from Qiagen (Product Number 204343) in conjunction with a TaqMan®dual labelled probe and primers corresponding to the gene of interest.The TaqMan® probe and primers can be ordered from Applied BiosystemsAssays-On-Demand™.

The dual labelled probe contains both a fluorophore and a quenchermolecule. The proximity of the fluorescent reporter with the quencherprevents the reporter from fluorescing, but during the PCR extensionstep, the 5′-3′ exonuclease activity of the Taq DNA polymerase releasesthe fluorophore which allows it to fluoresce. As such, the amount offluorescence correlates with the amount of PCR product generated.Examples of TaqMan probes which can be utilized with the genes disclosedin Table 2 are shown in Table 6 and/or 4.

Example 5 Identification of Combinations of Biomarkers Using LogisticRegression

A selection of eight genes from Table 2 were chosen as follows: MBTPS1,MGC45871, MKLN1, NIPBL APEH, MGC40157, PPP1R2, and FLJ23091.Combinations of pairs of the selected eight genes were tested todetermine the ability of each combination of pairs to screen for one ormore colorectal pathologies. Blood samples were drawn into lavender-topBD Vacutainer tube prior to anaesthesia and prior to any surgery. RNAfrom whole blood was prepared by first removing the serum andsubsequently treating the remaining sample with hypotonic Lysis Buffer(Lysis Buffer (1 L) 0.6 g EDTA; 1.0 g KHCO₂, 8.2 g NH₄Cl adjusted to pH7.4 (using NaOH)) in a ratio of 3 parts Lysis Buffer to 1 part blood soas to preferentially lyse the red blood cells. The samples werecentrifuged and RNA extracted from the unfractionated cells of thesample.

Real-time quantitative PCR was conducted on 68 patients diagnosed ashaving colorectal pathology (ie one or more subtypes of polyps) (n=68)and 110 control individuals where the control individuals were diagnosedas not having one or more colorectal pathologies (n=110). QRT-PCR wasperformed using a two step procedure whereby cDNA was first preparedbefore performing PCR using ABI reagents. In this example, a matrixcontaining the ΔCt of “ratios” of the eight biomarkers was used tocreate a reference training data set (AJ36h). The ratios used togenerate the matrix of ΔCts was further constrained by requiring eachratio to be comprised of one upregulated gene and a second downregulatedgene where the ΔCt was generated by subtracting the Ct of thedownregulated gene from the Ct of the upregulated gene. MBTPS1,MGC45871, MKLN1, NIPBL were identified as upregulated genes (ie whencomparing polyps vs non polyp individuals) (FIG. 3). APEH, MGC40157,PPP1R2, and FLJ23091 were identified as down regulated genes (FIG. 3).The advantages of gene-pair ratio method include a) reducinginter-individual variability, b) permitting analysis of individualsamples without references so as to ensure technical differences betweenplates are minimized, c) can use any reliable method (microarray,real-time PCR, etc), d) independent of the platform utilized for dataacquisition, e) no housekeeping gene required (relatively independent ofthe input of sample amount), f) translating the strengths of micorarrayexpression profiling into simple clinical tests, g) highly precise indisease discrimination.

A reference (training) data set (AJ36h) was constructed containing ΔCtvalues for each possible ratio as described and constrained as notedabove for the above eight genes assayed against a total of 178 subjectsincluding 110 subject without pathology (Female/male: 55/54 with onemissing information, age average 57 year ranging 23 to 83 years of age)and 68 subjects with diagnosed colorectal pathology (Female/male: 22/45with one missing information, age average 57 ranging from 38 to 82years). The types of pathology identified include 21 (31%) tubularadenomas, 18 (27%) hyperplastic and 7 (10%) high risk pathology (villousmorphology) and 22 (32%) other minor polyp (see Table 7).

Logistic regression (15-18) was used to analyze the dependence of thebinary variable Y (0=control (has no pathology), 1=disease (haspathology) on all possible combinations of the ΔCt values from thereference data training set AJ36h. If P=probability that a patientsample is diagnosed as “diseased”, then a function X=Logit (p) can bedefined as follows:

X=Logit(P)=ln (P/(1−P))=b ₀ +b ₁ ΔCt ₁ +b ₂ ΔCt ₂ + . . . +b _(n) ΔCt_(n)  (Eq 1)

If X≧threshold then Y=1 (diagnosis=“diseased”), and if X<threshold thenY=0 (diagnosis=“control”). The (empirical) coefficients {bi} that definethe relationship between X and the experimental measurements {ΔCti,where i represents a sample} were obtained by a maximum-likelihood (ML)fitting method. Identical {bi} values were obtained using severaldifferent commercial software programs: MedCalc (MedCalc Software,Mariakerke, Belgium) and XL-Stat (Addinsoft Software, Paris, France).ROC curve analysis was then used to evaluate the discriminatory power ofeach combination of ratios wherein all combinations of ratios asdescribed above were tested (19-21). Each combinations of ratiosresulted in an equation in the form of Equation 1. The top 10 bestequations that gave an ROC=0.72 were used in a formula where eachequation was given equal ranking to perform the subsequent blind testprediction.

Cross Validation

Cross-validation was performed on dataset AJ36h (the 178-sample trainingset) using WEKA (www.cs.waikato.ac.nz/˜ml/weka/index.html) (22). Twodifferent cross-validation schemes were used. The WEKA MetaAnalysisfunction was used to construct 100 bootstrap replicates of datasetAJ36h. Each of the new datasets was analyzed with the SimpleLogisticclassifier. Each of the resultant 100 logistic equations was thenanalyzed by 10-fold cross-validation. The results for all equations wereaveraged (“bootstrap aggregating”).

Prospective (Blind) Test

A blind set of 80 clinical samples were tested. The test set consistedof 40 controls and 40 subjects with one or more colorectal pathologies(having one or more polyps of any type). None of the test subjects usedin the blind test were used to generate a classifier for each possiblecombination of the ratio of biomarkers “ratios” where each ratio isselected as a combination of an upregulated gene (when comparingcolorectal pathology to no colorectal pathology) and a down regulatedgene (when comparing colorectal pathology to no colorectal pathology).MBTPS1, MGC45871, MKLN1, NIPBL were identified as upregulated genes (iewhen comparing polyps vs non polyp individuals (FIG. 3). APEH, MGC40157,PPP1R2, and FLJ23091 were identified as down regulated genes (FIG. 3).

Medical information on subjects in this blind set, including age,gender, and pathologist's report, is summarized in Table 7. Samples wererun on a single 96-well plate for Q RT-PCR with each of the eight genesmeasured in triplicate.

The measured values for each blind sample were evaluated using thefollowing algorithm, which consists of an initial calculation, a binarylogic gate, and a “committee machine” vote. Initial Calculation: Thelogit function X_(i) is computed for each of the equations (I=1 to N)that gave ROC Area=0.72 against the reference data set AJ36h. LogicGate: If X_(i)<threshold of j (where j stands for an equation number),then the blind sample is given score S_(i)=−1 for Eqn #j. IfX_(i)≧threshold of j, then it is given score S_(i)=+1 for Eqn #j. Voteby Committee Machine (23): A vote is taken over the scores from the 10logit equations used for diagnosis. By definition, Vote=Σ_(i). If Vote≦0then the sample is called “no pathology”, while if Vote>0 then thesample is called “diseased or with pathology.”

The best equation gives an ROC area 0.72 as shown in FIG. 4. Ten topequations that gave ROC=0.72 were used for the blind test prediction.Table 8 lists the parameters for the 10 equations.

Cross-validation was done with the SimpleLogistic function in WEKA. 100bootstrap replicates of dataset AJ36h were constructed and analyzed eachwith the WEKA SimpleLogistic function. The resultant set of 100 logisticequations was subjected to 10-fold cross-validation. The results for allequations were then averaged which gave an ROC Area=0.66, overallaccuracy 65%, sensitivity (TPF) 41%, specificity (TNF) 83%.

Prediction of Blind Samples

A blind test was conducted on an additional set of 80 samples (40samples without pathology and 40 samples with colorectal pathology). Acommittee vote was taken over the ten (10) best logit equations fromReference Dataset AJ36h, to predict a state of either “colorectalpathology present” or “colorectal pathology absent (control).” For theset of blind samples the sensitivity (true positive fraction, TPF) is43% (17/40), and the specificity (true negative fraction, TNF) is 80%(32/40), with an overall accuracy of 61% (49/80, Table 9).

While the present invention has been described with reference to whatare presently considered to be the preferred examples, it is to beunderstood that the invention is not limited to the disclosed examples.To the contrary, the invention is intended to cover variousmodifications and equivalent arrangements included within the spirit andscope of the appended claims.

All publications, patents and patent applications are hereinincorporated by reference in their entirety to the same extent as ifeach individual publication, patent or patent application wasspecifically and individually indicated to be incorporated by referencein its entirety.

Example 6 Testing of Combinations of Biomarkers Using LogisticRegression and Application of Said Combinations to Screen for One orMore Types of Colorectal Pathology or One or More Pathologies

A selection of seven genes were chosen including: LIM domain containingpreferred translocation partner in lipoma (LPP) Gene ID 4026; cytidinedeaminase (CDA) Gene ID 978; sarcoma antigen NY-SAR-48 (MGC20553) GeneID 93323; serine (or cysteine) proteinase inhibitor, clade E (nexin,plasminogen activator inhibitor type 1), member 2 (SERPINE 2) Gene ID5270; B-cell novel protein 1 (BCNP1) Gene ID 199786; hypotheticalprotein MGC45871 (MGC45871) Gene ID 359845; membrane-bound transcriptionfactor protease, site 1 (MBTPS1) Gene ID 8720. Genes were chosen fromeither Table 1 or other similar experiments.

Clinical Question—Polyp v. No Pathology

A reference (training) data set was constructed containing ΔCt valuesfor the above seven genes assayed against a total of 185 subjects havingany type of polyp and 239 subjects not having polyps.

Logistic regression was used to analyze the dependence of the binaryvariable Y (0=control, 1=disease) on the ΔCt values from the referencedata set. If P=probability that a patient sample is identified as“diseased”, then a function X=Logit (p) can be defined as follows:

X=Logit(P)=ln(P/(1−P))=b ₀ +b ₁ ΔCt ₁ +b ₂ ΔCt ₂ + . . . +b _(n) ΔCt_(n)  (Eq 1)

If X≧threshold then Y=1 (diagnosis=“has polyps”), and if X<thresholdthen Y=0 (diagnosis=does not have polyps). The (empirical) coefficients{bi} that define the relationship between X and the experimentalmeasurements {ΔCti, where i represents a sample} were obtained by amaximum-likelihood (ML) fitting method. Identical {bi} values wereobtained using several different commercial software programs: MedCalc(MedCalc Software, Mariakerke, Belgium) and XL-Stat (Addinsoft Software,Paris, France). ROC curve analysis was then used to evaluate thediscriminatory power of the combinations. The best equation used thefollowing genes (LPP; CDA; MBTPS1; SERPINE2; BCNP1) and shown asfollows:

X=Logit (P)=ln (P/(1−P))=−4.9104-0.4278 ΔCtMGC20553 −0.6164ΔCtCDA+0.8230 ΔCtMBTPS1+0.3961ΔCtSERPINE2+0.1641 ΔCtBCNP1 gave an ROC=0.73with a sensitivity of 80% and a specificity of 50%.

Clinical Question—High Risk Pathology v. Other

Using the same selection of seven biomarkers, but this time testedagainst a reference (training) data set with ΔCt values for the aboveseven genes assayed against a total of 129 subjects having “high risk”polyp and 295 subjects either having a polyp not classified as high riskor having no pathology, where subjects where classified as having “highrisk polyps” if they had one of the following categories of polyps:Tubulovillous Adenoma; Villous Adenoma; High Grade Dysplasia and TubularAdenoma where the diameter of the Tubular Adenoma polyp is greater than10 mm and cancer

Logistic regression was used to analyze the dependence of the binaryvariable Y (0=control, 1=disease) on the ΔCt values from the referencedata set. If P=probability that a patient sample is identified as“diseased”, then a function X=Logit (p) can be defined as follows:

X=Logit(P)=ln(P/(1−P))=b ₀ +b ₁ ΔCt ₁ +b ₂ ΔCt ₂ + . . . +b _(n) ΔCt_(n)  (Eq 1)

If X≧threshold then Y=1 (diagnosis=“has high risk polyps”), and ifX<threshold then Y=0 (diagnosis=does not have high risk polyps). The(empirical) coefficients {bi} that define the relationship between X andthe experimental measurements {ΔCti, where i represents a sample} wereobtained by a maximum-likelihood (ML) fitting method. Identical {bi}values were obtained using several different commercial softwareprograms: MedCalc (MedCalc Software, Mariakerke, Belgium) and XL-Stat(Addinsoft Software, Paris, France). ROC curve analysis was then used toevaluate the discriminatory power of the combinations. The best equationused the following genes (CDA; MGC20553; MBTPS1; SERPINE2; BCNP1) andshown as follows:

X=Logit (P)=ln (P/(1−P))=−5.2981 −0.5433ΔCtCDA −0.4958ΔCtMGC20553+0.8551ΔCtMBTPS1+0.3554ΔCtBCNP1+0.2438ΔCt SERPINE2

gave an ROC=0.74 with a sensitivity of 83% and a specificity of 46%.

Clinical Question—Cancer v. Other

Finally the same seven genes were tested in combinations using areference (training) data set containing ΔCt values for the above sevengenes assayed against a total of 80 subjects having cancerous polyps and344 subjects having other types of polyps or having no pathology.

Logistic regression was used to analyze the dependence of the binaryvariable Y (0=control, 1=disease) on the ΔCt values from the referencedata set. If P=probability that a patient sample is identified as“diseased”, then a function X=Logit (p) can be defined as follows:

X=Logit(P)=ln (P/(1−P))=b ₀ +b ₁ ΔCt ₁ +b ₂ ΔCt ₂ + . . . +b _(n) ΔCt_(n)  (Eq 1)

If X≧threshold then Y=1 (diagnosis=“has cancerous polyps”), and ifX<threshold then Y=0 (diagnosis=does not have cancerous polyps). The(empirical) coefficients {bi} that define the relationship between X andthe experimental measurements {ΔCti, where i represents a sample} wereobtained by a maximum-likelihood (ML) fitting method. Identical {bi}values were obtained using several different commercial softwareprograms: MedCalc (MedCalc Software, Mariakerke, Belgium) and XL-Stat(Addinsoft Software, Paris, France). ROC curve analysis was then used toevaluate the discriminatory power of the combinations. The best equationused the following genes (MGC20553, CDA, MBTPS1, SERPINE2, MGC45871;BCNP1) and shown as follows:

X=Logit (P)=ln (P/(1-P))=−12.9149 −0.5378ΔCtCDA −0.5398ΔCtMGC20553+1.0386ΔCtMBTPS1++0.7405ΔCtBCNP1+0.4002 ΔCtMGC45871+0.2074ΔCtSERPINE2gave an ROC=0.83 with a sensitivity of 90% and a specificity of 55%.

Clinical Question—High Risk Pathology v. Other

A reference (training) data set was constructed containing ΔCt valuesfor the above seven genes assayed against a total of 252 subjects havinghigh risk polyps and 272 subjects having other types of polyps or havingno pathology where by high risk polyps is meant Tubulovillous Adenoma,Villous Adenoma, High Grade Dysplasia and Tubular Adenoma—regardless ofdiameter of polyp.

Logistic regression was used to analyze the dependence of the binaryvariable Y (0=control, 1=disease) on the ΔCt values from the referencedata set. If P=probability that a patient sample is identified as“diseased”, then a function X=Logit (p) can be defined as follows:

X=Logit(P)=ln(P/(1−P))=b ₀ +b ₁ ΔCt ₁ +b ₂ ΔCt ₂ + . . . +b _(n) ΔCt_(n)  (Eq 1)

If X≧threshold then Y=1 (diagnosis=“has high risk polyps”), and ifX<threshold then Y=0 (diagnosis=does not have high risk polyps). The(empirical) coefficients {bi} that define the relationship between X andthe experimental measurements {ΔCti, where i represents a sample} wereobtained by a maximum-likelihood (ML) fitting method. Identical {bi}values were obtained using several different commercial softwareprograms: MedCalc (MedCalc Software, Mariakerke, Belgium) and XL-Stat(Addinsoft Software, Paris, France). ROC curve analysis was then used toevaluate the discriminatory power of the combinations. The best equationused the following genes (MGC20553; CDA; MBTPS1; SERPINE2; BCNP1) andshown as follows:

X=Logit (P)=ln (P/(1−P))=−3.5819 −0.7625ΔCtCDA−0.6117ΔCtMGC20553+1.15ΔCtMBTPS1+0.2174ΔCtBCNP1+0.2439ΔCt SERPINE2

gave an ROC=0.76 with a sensitivity of 82% and a specificity of 52%.

Example 7 Biomarkers to Screen for Presence of Colorectal Cancer.Deriving Classifiers to be Used with Combinations of Biomarkers andApplication of Said Classifiers to Determine Presence of ColorectalCancer

QRT-PCR was performed on a selection of the genes identified from one ormore microarray analyses (data not shown) as being able to differentiateas between individuals having colorectal cancer and individuals nothaving colorectal cancer. Some of genes selected for QRT-PCR wereselected from microarray data performed on samples collected from threeregions of China for over 593 samples including 61 samples fromindividuals having been diagnosed with colorectal cancer and 532 fromindividuals not having colorectal cancer (data not shown). Other geneswere selected from other similar microarray experiments with individualsfrom North America and/or Asia. Among the individuals not havingcolorectal cancer was a mixture of individuals having breast cancer,kidney cancer, prostate cancer, bladder cancer, and individuals havingother subtypes of colorectal pathology which were not colorectal cancer.

QRT-PCR was done on each individual gene across a population ofindividuals having colorectal cancer and a population of individuals nothaving colorectal cancer. QRT-PCR experiments were done using either theSYBR® Green Kit from Qiagen (Product Number 204143) and/or using AppliedBiosystems PCR reagent kits (Cat 4334973) and/or using TaqMan Assayusing the QuantiTect® Probe PCR Kit (Qiagen, Cat. #204345). TaqMan®probes were developed for each gene of interest and labelled with FAMand the Black Hole Quencher® from Biosearch Technologies. Beta-actin wasused as a housekeeping gene in the duplexed assays and labelled usingHEX and Black Hole Quencher®. Amplicons were detected in real time usinga Prism 7500 instrument (Applied Biosystems). Results of the QRT-PCR foreach gene across the population tested are shown in Table 12.

Rather than testing all possible combinations of the biomarkers noted inTable 12, in other embodiments, all possible combinations of biomarkerswith a p value of less than 0.05 can be chosen and all or a portion ofcombinations of biomarkers tested. Discussed below is representativeclassifiers identified for selected combinations tested.

Classifiers were derived for all two gene combinations of the biomarkersidentified in Table 12 using QRT-PCR for 28 of the genes identified inTable 12 across 58 individuals having colorectal cancer and 57individuals not having colorectal cancer. The 28 genes utilized arerepresented by the following gene symbols and are described in moredetail herein: OSBPL10, LOC283130, BANK1, COBLL1, MGC24039, C9orf85,BLNK, BCNP1, PDE3B, AKAP13, WDFY1, CDA, AGTRAP, ACTR2, UTS2, MS4A1,SPAP1, ANK3, KIAA1559, GBP5, MGC20553, CEACAM1, HIST1H4H, PRG1, BRD2,LTBP3, MAP4K3, and NIPA2 Primers utilized for the real time RT-PCR arefurther described in Table 16. Classifiers derived for selectioncombinations of two genes are shown as follows:

Table A: Table showing Resulting Classifiers for a Selection of Two GeneCombinations Wherein the Combinations are in the Form of Ratios and theBiomarkers Are Selected from the 28 Genes Selected from Table 12. Shownis the resulting ROC area of the Classifiers, the Sensitivity (at thenoted cutoff Sens) and Specificity (at the noted cutoff Spec) are shown.The constant, and the coefficient for the selected two gene ratio arenoted.

TABLE A Sensitivity at Specificity at 90% 90% Coeffic for ROCareaSpecificity Sensitivity Constant Ratio BANK1/CDA 0.8621 46.55 68.42−3.7808 1.256 BCNP1/CDA 0.8451 43.10 57.89 −2.7251 1.2554 CDA/MS4A0.8430 43.10 63.16 −1.3171 −1.0944 C9orf85/CDA 0.8382 44.83 52.63−16.482 2.2374 CDA/OSBPL10 0.8367 56.90 47.37 −5.9699 −1.2875 CDA/SPAP10.8364 48.28 59.65 −3.3047 −0.9384 BLNK/CDA 0.8361 50.00 68.42 −4.86031.2367 CDA/LOC283130 0.8339 67.24 52.63 −3.9388 −1.6368 CDA/COBLL10.8321 53.45 56.14 −6.9371 −1.3126 BANK1/PRG1 0.8306 55.17 54.39 −8.62821.2825 BANK1/CEACAM1 0.8291 56.9 56.14 −1.7783 1.1225 BCNP1/PRG1 0.822460.34 45.61 −7.4478 1.2631 BANK1/MGC20553 0.8197 44.83 59.65 2.24120.9436 BANK1/NIPA2 0.8176 58.62 50.88 −2.2946 1.6015 ACTR2/BANK1 0.815541.38 54.39 −9.1667 −1.4596 CDA/HIST1H4H 0.8134 51.72 47.37 1.6688−1.5062 AKAP13/GBP5 0.7012 22.41 22.81 −0.7784 0.6743 BCNP1/CEACAM10.8131 44.83 52.63 −0.8916 1.1843 CEACAM1/SPAP1 0.8128 34.48 56.14−1.8563 −0.88 CDA/MAP4K3 0.8128 39.66 45.61 −9.7566 −1.7304 BCNP1/GBP50.8107 36.21 57.89 −4.611 1.1411

As noted for each two gene combination the ROC area is provided inaddition to the specificity (when sensitivity is set at 90%) andsensitivity (when specificity is set at 90%). The cutoff utilized togenerate the sensitivity and specificity as noted are provided in thetwo righthand most columns. The classifier for each of the two genecombinations can be generically described as follows:

X (having colorectal cancer)=Logit (P)=ln (P/(1−P))=b₀+b₁ (ΔCt₁−ΔCt₂)

Where b₀ is identified as the coefficient and b₁ is noted as thecoefficient for the ratio.

Example 8

Classifiers were also derived for all possible combinations of aselection of nine of the biomarkers identified in Table 12. Data for usein generating the classifiers for the combinations were obtained usingreal time RT-PCR for each of the nine genes tested across the same 58individuals having colorectal cancer and 57 individuals not havingcolorectal cancer described in Example 7. The 9 genes utilized arerepresented by the following gene symbols and are described in moredetail herein: CDA, MGC20553, MS4A1, BCNP1, BANK1, GBP5, OSBPL10, SPAP1,LOC283130. Primers utilized for the real time RT-PCR are furtherdescribed in Table 16. A selection of the resulting classifiers aredescribed below in Table B below:

TABLE B ROC # area Constant CDA MGC20553 MS4A1 BCNP1 BANK1 GBP5 OSBPL10SPAP1 LOC283130 1 0.8364 −14.48 1.61 1 0.8197 −10.07 1.25 1 0.8091−10.56 1.02 2 0.8681 −21.35 1.39 0.77 2 0.8648 −7.58 −0.93 1.41 2 0.8594−21.80 1.27 0.81 3 0.8839 −12.85 −1.26 1.20 0.93 3 0.8811 −13.04 −1.251.06 0.97 3 0.8757 −10.48 −1.20 1.11 0.94 4 0.8963 −9.09 −1.47 1.34−0.72 0.93 4 0.8917 −7.25 −1.29 −0.51 1.17 1.02 4 0.8902 −5.32 −1.591.11 −0.88 1.14 5 0.8987 −10.60 −1.42 0.99 0.43 −0.71 0.92 5 0.8984−8.84 −1.63 1.16 −0.80 0.75 0.52 5 0.8969 −9.34 −1.65 0.94 −0.72 0.690.75 6 0.9035 −10.25 −1.57 0.82 0.43 −0.78 0.73 0.52 6 0.9005 −10.91−1.42 −0.32 1.13 0.62 −0.72 0.91 6 0.9005 −10.64 −1.41 1.19 0.57 −0.710.88 −0.26 7 0.9053 −10.60 −1.57 −0.31 0.96 0.61 −0.79 0.73 0.51 70.9020 −10.31 −1.55 0.98 0.53 −0.77 0.71 −0.20 0.49 7 0.9014 −11.32−1.59 0.14 0.82 0.46 −0.87 0.71 0.52 8 0.9056 −10.57 −1.57 −0.27 0.990.62 −0.79 0.72 −0.06 0.50 8 0.9044 −11.30 −1.59 0.09 −0.29 0.95 0.62−0.85 0.71 0.51 8 0.9026 −11.17 −1.57 0.11 0.96 0.55 −0.85 0.70 −0.180.49 9 0.9041 −11.26 −1.58 0.09 −0.25 0.98 0.63 −0.85 0.71 −0.06 0.50

The classifiers shown above can be generically described in thefollowing equation.

X (possibility of having colorectal cancer)=Logit(P)=ln(P/(1−P))=b ₀ +b₁ ΔCt ₁ +b ₂ ΔCt ₂ + . . . +b _(n) ΔCt _(n)

Noted above are selected classifiers for combinations of 1, 2, 3, 4, 5,6, 7, 8 or all 9 of the selected genes. The area under the curve isnoted (ROC Area). Also provided is the constant b₀ and the coefficientfor each biomarker (e.g., b_(n)) required for the selected classifier tobe applied to the delta Ct of the noted gene for a test sample. As wouldbe understood, where no coefficient is noted, that biomarker is notrequired for the classifier. Additional sensitivity and specificitydeterminations can be ma de for each classifier, and will vary dependingupon the threshold set.

The results of all of the combinations of the 9 genes are graphicallyrepresented in FIG. 5.

Example 9

All possible combinations of biomarkers identified in Table 12 aretested by applying logistic regression to data corresponding to level ofproduct of the biomarkers noted. QRT-PCR is conducted on each of thegenes noted in Table 12 for a population of individuals who have beendiagnosed as having colorectal cancer or not having colorectal cancer. Amatrix containing the ΔCt for RNA corresponding to each gene for eachindividual of the two populations is created and classifiers derived foreach possible combination of biomarkers of those listed in Table 12(whether individually statistically significant or not) using techniquesas described herein. For each classifier the ROC curve is plotted andthe area under the curve determined. Classifiers are chosen depending onthe specific sensitivity and specificity requirements of the specificintended use of the biomarkers (for example, if may be desirable to havea high sensitivity and fewer false negatives so as to miss lesscolorectal cancers); (alternatively high specificity resulting in lowerfalse positives is also desirable as it can decrease costs of additionalunnecessary medical interventions). A blind test is conducted on one ormore of the resulting classifiers so as to demonstrate the utility ofthe classifier to test for colorectal cancer in a test individual. Oneor more the classifiers is applied to test an individual to determinethe likelihood of said test subject having colorectal cancer.

Example 10 Blind Testing of One of the Combinations of the Genes inTable 12 Using the Derived Classifier

A five gene combination encompassing B-cell scaffold protein withankyrin repeats (BANK1), B-cell novel protein 1 (BCNP1), cytidinedeaminase (CDA) membrane-spanning 4-domains, subfamily A, member 1(MS4A1) and FERM domain containing 3 (MGC20553 aka FRMD3) was selectedto pursue further blind sample testing.

The reference (training) data set was constructed containing ΔCt valuesfor the above five genes assayed against a total of 57 subjects havingcolorectal cancer and 58 subjects not having colorectal cancer asdescribed in more detail in Example 7.

Logistic regression was used to analyze the dependence of the binaryvariable Y (0=control, 1=disease) on the ΔCt values from the referencedata set. If P=probability that a patient sample is identified as“having colorectal cancer), then a function X=Logit (p) can be definedas follows:

X=Logit(P)=ln(P/(1−P))=b ₀ +b ₁ ΔCt ₁ +b ₂ ΔCt ₂ + . . . +b _(n) ΔCt_(n)  (Eq 1)

If X≧threshold then Y=1 (diagnosis=“has colorectal cancer”), and ifX<threshold then Y=0 (diagnosis=does not have colorectal cancer). The(empirical) coefficients {bi} that define the relationship between X andthe experimental measurements {ΔCti, where i represents a sample} wereobtained by a maximum-likelihood (ML) fitting method. Identical {bi}values were obtained using several different commercial softwareprograms: MedCalc (MedCalc Software, Mariakerke, Belgium) and XL-Stat(Addinsoft Software, Paris, France). ROC curve analysis was then used toevaluate the discriminatory power of the combinations. The classifierderived using the selected genes (BANK1, BCNP1, CDA, MS4A1, andMGC20553) and shown as follows:

X=Logit (P)=ln (P/(1−P))=−5.1338 −0.8399 (ΔCt CDA) −0.3314(ΔCtMGC20553)−0.3245 (ΔCtMS4A1)+1.0903 (ΔCtBCNP1)+0.7842 (ΔCtBANK1) gavean area under the curve of 0.883±0.032. One can adjust the sensitivityor specificity by adjusting the testing cut off point as follows:

Cut-off Sens. (95% C.I.) Spec. (95% C.I.) −0.53 89.7 (78.8-96.1) 78.9(66.1-88.6) −0.31 86.2 (74.6-93.8) 80.7 (68.1-89.9) −0.04 81.0(68.6-90.1) 82.5 (70.1-91.2) 0.47 72.4 (59.1-83.3) 87.7 (76.3-94.9) 0.5967.2 (53.7-79.0) 89.5 (78.5-96.0) 1.46 43.1 (30.2-56.8) 94.7 (85.4-98.8)Using a cutoff of −1.1 gives a sensitivity of 98.3% and a specificity of50.9%.

A first blind test was performed using a scoring population comprised of15 individuals not having colorectal cancer and 6 individuals havingcolorectal cancer. This blind test utilized individuals selected from asingle site in Penang. The first blind test resulted in a sensitivity of100% and a specificity of 43%. A second blind test was performed using asecond scoring population (non overlapping with the first scoringpopulation) of 31 non colorectal patients and 23 colorectal patients.Patient samples were collected from three different sites in Asia. Thetest resulted in a sensitivity of 100% (all samples with colorectalcancer properly identified) and a specificity of 47% (almost half of thesamples without colorectal cancer properly identified). A final blindtest was performed utilizing samples obtained from two clinics in NorthAmerica including 15 colorectal cancer patients and 16 non colorectalcancer patients resulting in a sensitivity of 88% and a specificity of33%.

Example 11 Selection of 6 Biomarkers (BCNP1, CD163, CDA, MS4A1, BANK1,MGC20553) to Derive Classifiers which are Particularly Useful inScreening for the Presence of Colorectal Cancer and Application of theClassifiers with the Combinations Selected to Determine Presence ofColorectal Cancer

QRT-PCR was performed on the selection of genes BCNP1, CD163, CDA,MS4A1, BANK1, MGC20553 identified from one or more microarray analyses(selected from Table 12 and Table 11). QRT-PCR data was most recentlycollected on RNA samples from centrifuged lysed blood from 109 samples,60 individuals having colorectal cancer and 59 individuals not havingcolorectal cancer. QRT-PCR was performed using a duplexed TaqMan Assayusing the QuantiTect® Probe PCR Kit (Qiagen, Cat. #204345). TaqMan®probes were developed for each gene of interest and labelled with FAMand the Black Hole Quencher® from Biosearch Technologies. Beta-actin wasused as a housekeeping gene in the duplexed assays and labelled usingHEX and Black Hole Quencher®. ΔCts (Ct gene of interest-Ct Beta-actin)were calculated.

Results of the average QRT-PCR results for each gene across thepopulation tested are shown in Table C below. The average ΔCt (Ctgene-Ct Beta-actin) of each gene in both the control population and thepopulation having colorectal cancer (CRC) are shown, as is the standarddeviation (SD) as amongst the data from the population. The p value foreach gene individually is shown. The change in ΔCt as between the ΔCt ofthe Control samples for each gene and the ΔCt for the CRC samples foreach gene are shown as AΔCt and the standard deviation are shown. Alsoshown is the fold change for each individual marker.

TABLE C BANK1 BCNP1 CDA MGC20553 MS4A1 CD163 Control Avg ΔCt of 7.59986.5558 3.4814 9.7375 3.9940 5.2012 Control (non CRC) SD 0.767 0.9070.491 1.036 0.974 0.381 Colorectal CRC Avg 8.3398 7.6368 3.2070 9.37735.1298 5.3896 Cancer Standard 0.708 0.893 0.539 0.867 1.041 0.482 (CRC)Deviation p-value 2.6E−07 1.6E−09 4.4E−03 4.2E−02 1.1E−08 2.0E−02 (CRCv. Control) ΔΔCt(ΔCtCont − −0.7400 −1.0810 0.2743 0.3602 −1.1359 −0.1884ΔCtCRC) Standard 1.044 1.273 0.729 1.351 1.425 0.614 DeviationRegulation Down Down Up Up Down Down (direction of regulated Regulatedregulated regulated regulated regulated regulation comparing CRC v.Control) Fold change 1.67 2.12 1.21 1.28 2.20 1.14 (CRC v. Control)Standard 2.06 2.42 1.66 2.55 2.69 1.53 Deviation (Fold change)

A matrix of the individual QRT-PCR data corresponding to each of the sixbiomarkers across a population comprising 60 individuals havingcolorectal cancer and 59 individuals not having colorectal cancer wasgenerated using the same methods noted above. The matrix of data wasused to generate classifiers for all possible combinations of the sixbiomarkers by applying logistic regression to the data of each possiblecombination. A listing of all of the combinations and the biomarkerswhich make up each combination are noted in Table D below. The ROC curvefor each resulting classifier was measured to determine the ability ofeach derived classifier to properly identify a test patient as havingcolorectal cancer. The given sensitivity (at a pre-defined specificity)and the specificity (at a pre-defined sensitivity) can be determinedfrom the ROC curve. The resulting classifiers from this analysis areshown in Table E and the results of these classifiers are showngraphically in FIG. 5.

Table D. Each combination is noted as a unique combination numberranging from 1 to 63. Presence of the number 1 in a column below a notedGene indicates the presence of that gene within the generatedclassifier.

TABLE D Combination # BANK1Gene01 BCNP1Gene02 CDAGene03 MS4A1Gene04MGC20553Gene05 CD163Gene06 1 1 2 1 3 1 1 4 1 5 1 1 6 1 1 7 1 1 1 8 1 9 11 10 1 1 11 1 1 1 12 1 1 13 1 1 1 14 1 1 1 15 1 1 1 1 16 1 17 1 1 18 1 119 1 1 1 20 1 1 21 1 1 1 22 1 1 1 23 1 1 1 1 24 1 1 25 1 1 1 26 1 1 1 271 1 1 1 28 1 1 1 29 1 1 1 1 30 1 1 1 1 31 1 1 1 1 1 32 1 33 1 1 34 1 135 1 1 1 36 1 1 37 1 1 1 38 1 1 1 39 1 1 1 1 40 1 1 41 1 1 1 42 1 1 1 431 1 1 1 44 1 1 1 45 1 1 1 1 46 1 1 1 1 47 1 1 1 1 1 48 1 1 49 1 1 1 50 11 1 51 1 1 1 1 52 1 1 1 53 1 1 1 1 54 1 1 1 1 55 1 1 1 1 1 56 1 1 1 57 11 1 1 58 1 1 1 1 59 1 1 1 1 1 60 1 1 1 1 61 1 1 1 1 1 62 1 1 1 1 1 63 11 1 1 1 1

In preferred embodiments, data representing levels of products of anycombination of two of the six biomarkers in a sample isolated or derivedfrom a test subject are input to a formula as further described herein,for the purpose of providing a probability that the test subject has oneor more colorectal pathologies. In certain other embodiments, datarepresenting levels of products of any combination of three, four, orfive of the six biomarkers are input to a formula as further describedherein. It is also consistent with the methods described herein thatdata representing levels of products of all six of the six biomarkersare input to a formula to provide a probability that a test subject hasone or more colorectal pathologies.

Table E. Classifiers resulting from each possible combination of the sixbiomarkers are noted. The Combination number corresponds to thecombinations as noted in Table D. The number of genes contributing tothe combination is noted, and the coefficient for that biomarker isnoted within the row. Where 0 is noted as a coefficient, this biomarkerdoes not contribute to the resulting classifier. The area under thecurve (ROC area) is noted as is the sensitivity at 90% specificity andthe specificity at 90% sensitivity.

TABLE E Sensitivity Specificity # of @ 90% @ 90% Comb # genes ROCareaspec sens Constant BANK1 BCNP1 CDA MGC20553 MS4A1 CD163 2 1 0.8065 48.3350.85 −9.4336 0 1.3378 0 0 0 0 16 1 0.8062 33.33 40.68 −5.1508 0 0 0 01.1456 0 1 1 0.7664 25.00 40.68 −10.646 1.3395 0 0 0 0 0 8 1 0.653111.67 18.64 3.9001 0 0 0 −0.40595 0 0 4 1 0.6421 26.67 15.25 3.5585 0 0−1.0592 0 0 0 32 1 0.6232 18.33 18.64 −5.4467 0 0 0 0 0 1.0322 48 20.8285 36.67 61.02 −10.284 0 0 0 0 1.1534 0.96391 34 2 0.8184 48.3354.24 −14.22 0 1.3294 0 0 0 0.91273 6 2 0.8155 33.33 52.54 −6.6582 01.2559 −0.65891 0 0 0 18 2 0.8130 48.33 50.85 −8.6118 0 1 0 0 0.3475 020 2 0.8102 35.00 42.37 −2.2241 0 0 −0.78667 0 1.0765 0 3 2 0.8093 48.3347.46 −10.046 0.18289 1.2184 0 0 0 0 10 2 0.8068 51.67 52.54 −11.01 01.3885 0 0.12681 0 0 24 2 0.8065 31.67 42.37 −4.3207 0 0 0 −7.84E−02 1.1282 0 17 2 0.8048 28.33 42.37 −3.938 −0.24835 0 0 0 1.3149 0 33 20.7921 36.67 49.15 −16.106 1.3541 0 0 0 0 1.0097 5 2 0.7754 30.00 54.24−7.6711 1.2227 0 −0.61344 0 0 0 9 2 0.7718 35.00 52.54 −8.8519 1.2937 00 −0.14952 0 0 36 2 0.7113 20.00 27.12 −2.8691 0 0 −1.391 0 0 1.4243 402 0.6983 15.00 25.42 −1.5982 0 0 0 −0.47483 0 1.1646 12 2 0.6961 18.3318.64 7.3191 0 0 −1.0488 −0.39666 0 0 52 3 0.8319 26.67 61.02 −8.1557 00 −1.0734 0 1.0665 1.3105 49 3 0.8299 38.33 64.41 −9.4457 −0.16237 0 0 01.2638 0.95548 38 3 0.8291 51.67 52.54 −11.787 0 1.2127 −0.90702 0 01.1775 50 3 0.8288 43.33 55.93 −13.395 0 0.96662 0 0 0.38068 0.91813 563 0.8285 40.00 59.32 −8.8677 0 0 0 −0.16768 1.1205 1.0282 35 3 0.822650.00 52.54 −15.171 0.2494 1.1715 0 0 0 0.92838 22 3 0.8192 38.33 50.85−5.7441 0 0.90176 −0.66762 0 0.35836 0 21 3 0.8192 25.00 42.37 1.8168−0.71915 0 −0.9327 0 1.5568 0 42 3 0.8178 48.33 52.54 −14.643 0 1.3458 04.17E−02 0 0.89513 14 3 0.8158 38.33 52.54 −8.2509 0 1.3096 −0.661510.1273 0 0 7 3 0.8155 33.33 52.54 −6.6677 2.33E−03 1.2544 −0.65841 0 0 019 3 0.8147 45.00 52.54 −7.7027 −0.18144 0.99459 0 0 0.47486 0 28 30.8124 38.33 42.37 −1.6155 0 0 −0.78053 −5.87E−02  1.0615 0 26 3 0.811648.33 52.54 −9.8432 0 1.0675 0 9.36E−02 0.31563 0 11 3 0.8085 51.6747.46 −11.398 0.15545 1.2828 0 0.11619 0 0 25 3 0.8068 28.33 47.46−3.1168 −0.24726 0 0 −7.82E−02  1.2971 0 37 3 0.8020 25.00 57.63 −13.2061.1841 0 −0.93208 0 0 1.3051 41 3 0.8017 38.33 55.93 −13.824 1.2908 0 0−0.23193 0 1.0935 13 3 0.7788 20.00 52.54 −5.7872 1.1714 0 −0.61629−0.15341 0 0 44 3 0.7500 23.33 49.15 0.93831 0 0 −1.3775 −0.46832 01.5416 53 4 0.8421 28.33 62.71 −4.0963 −0.72824 0 −1.2159 0 1.55571.3112 54 4 0.8359 43.33 54.24 −10.864 0 0.79246 −0.93889 0 0.435421.215 60 4 0.8345 26.67 61.02 −6.7979 0 0 −1.0645 −0.16209 1.0289 1.373357 4 0.8322 41.67 64.41 −8.0548 −0.15788 0 0 −0.16741 1.2282 1.0198 39 40.8294 51.67 52.54 −11.969 4.28E−02 1.1857 −0.89948 0 0 1.179 58 40.8288 43.33 55.93 −13.371 0 0.96496 0 −2.24E−03  0.38154 0.91909 51 40.8285 41.67 55.93 −12.968 −7.93E−02  0.9637 0 0 0.43635 0.91323 46 40.8285 51.67 50.85 −12.048 0 1.2235 −0.90636 2.49E−02 0 1.1669 23 40.8237 41.67 44.07 −1.9249 −0.65485 0.87804 −0.81254 0 0.81182 0 43 40.8212 50.00 52.54 −15.382 0.2432 1.1845 0 2.32E−02 0 0.91822 30 40.8195 43.33 47.46 −6.9954 0 0.97268 −0.669 9.40E−02 0.32588 0 29 40.8192 28.33 47.46 2.3538 −0.71624 0 −0.92591 −5.37E−02  1.5416 0 15 40.8164 38.33 52.54 −8.1437 −3.34E−02  1.3322 −0.66871 0.12965 0 0 27 40.8124 45.00 54.24 −8.9424 −0.1764 1.0611 0 9.25E−02 0.43969 0 45 40.8014 30.00 67.80 −10.66 1.1055 0 −0.93995 −0.24668 0 1.3918 55 50.8410 30.00 57.63 −6.9559 −0.67682 0.76786 −1.0879 0 0.9056 1.2208 61 50.8393 28.33 64.41 −2.722 −0.73359 0 −1.2079 −0.16199 1.523 1.3754 62 50.8379 43.33 55.93 −10.555 0 0.76996 −0.94059 −2.76E−02  0.44682 1.22859 5 0.8285 41.67 55.93 −12.939 −7.94E−02  0.96171 0 −2.68E−03  0.437460.91437 47 5 0.8277 51.67 50.85 −12.173 3.61E−02 1.1995 −0.900052.22E−02 0 1.1693 31 5 0.8212 41.67 40.68 −3.1633 −0.65384 0.94851−0.81521 9.32E−02 0.77795 0 63 6 0.8407 31.67 61.02 −6.5754 −0.679290.74043 −1.0901 −3.29E−02  0.92168 1.2365

Primers and probes which were utilized for the QRT-PCR) are furtherdescribed in Table 17.

Example 12 Testing a Test Subject for One or More Colorectal PathologiesUsing the 6 Biomarkers (BCNP1, CD163, CDA, MS4A1, BANK1, and MGC20553)

A reference population of 200 individuals is used to generate a formulawhich will be used to test the test subject. 100 of said individuals areconfirmed to have colorectal cancer by colonoscopy. The remaining 100individuals are screened for colorectal cancer by colonoscopy and areconfirmed to be negative for colorectal cancer. Blood Samples areobtained for each of the 200 individuals and total RNA isolated fromeach isolated blood sample. RNA from each sample is reverse transcribedusing an oligo dT primer and cDNA corresponding to the total mRNA isobtained. QRT-PCR is performed on each of the six genes in each of thesamples derived from the blood sample and a ΔCt generated for each genein reference to a Beta-actin control. A data matrix is generated of allof the data across the population. A classifier is developed using eachof the following methods (a) logistic regression, (b) linear regression(c) neural networks and (d) principle component analysis. A formulaconsisting of each of the classifiers, wherein each classifier itself isgiven a weighting of equal value is generated (ie the results of eachclassifier when applied to a test subject will give an indication ofwhether the test subject has a colorectal pathology, and then theresults of each classifier are tallied such that if, for example, 3 ofthe 4 classifiers indicate the test subject has colorectal pathology,the results of the formula indicate the test subject has colorectalpathology).

A blood sample is isolated from a test subject. Total RNA from the bloodsample is isolated and cDNA derived using an oligo dT primer. QRT-PCR isperformed in each of the six genes in the sample and a ΔCt generated foreach gene in reference to a Beta-actin control. The data from the testsubject's sample is input into the formula consisting of the fourclassifiers and a result of each classifier determined, along with theresults of the sum of the four classifiers to provide an indication ofwhether said test subject has colorectal pathology, and in particularcolorectal cancer.

FULL CITATIONS FOR REFERENCES REFERRED TO BY NUMBER IN THE SPECIFICATION

-   1] Ogawa M. Differentiation and proliferation of hematopoietic stem    cells. Blood 1993; 81:2844-53.-   2] Liew, C C. Method for the detection of gene transcripts in blood    and uses thereof. U.S. Patent Application No. 2002000268730.-   3] Ma J, Liew C C. Gene profiling identifies secreted protein    transcripts from peripheral blood cells in coronary artery disease.    J Mol Cell Cardiol 2003; 35:993-8.-   4] Tsuang M T, Nossova N, Yager T D, Tsuang M M, Guo S C, Shyu K G,    et al. Assessing the validity of blood-based gene expression    profiles for the classification of schizophrenia and bipolar    disorder: A preliminary report. Am J Med Genet B Neuropsychiatr    Genet 2005; 133B:1-5.-   5] K. Wayne Marshal, Hongwei Zhang, Thomas D. Yager, Nadine Nossova,    Adam Dempsey, Run Zheng, Mark Han, Hongchang Tang, Samuel    Chao., C. C. Liew. Blood-Based Biomarkers for Detecting Mild    Osteoarthritis in the Human Knee. Osteoarthritis and Cartilage 2005;    13: 861-871.-   6] Whistler T, Unger E R, Nisenbaum R, Vernon S D. Integration of    gene expression, clinical, and epidemiologic data to characterize    Chronic Fatigue Syndrome. J Transl Med. 2003; 1:10.-   7] Bennett L, Palucka A K, Arce E., Cantrell V, Borvak J, Banchereau    J, et al. Interferon and granulopoiesis signatures in systemic lupus    erythematosus blood. J Exp Med 2003; 197:711-23.-   8] Zhang H Q, Lu H, Enosawa S, Takahara S, Sakamoto K, Nakajima T,    et al. Microarray analysis of gene expression in peripheral blood    mononuclear cells derived from long-surviving renal recipients.    Transplantation Proceedings 2002; 34:1757-9.-   9] Tang Y, Nee A C, Lu A, Ran R, Sharp F R. Blood genomic expression    profile for neuronal injury. J Cereb Blood Flow Metab. 2003;    23:310-9.-   10] Tang Y, Gilbert D L, Glauser T A, Hershey A D, Sharp F R. Blood    Gene Expression Profiling of Neurologic Diseases: A Pilot Microarray    Study. Arch Neurol. 2005; 62:210-5.-   11] Imperiale T F, Wagner D R, Lin C Y, Larkin G N, Rogge J D,    Ransohoff D F. Results of screening colonoscopy among persons 40 to    49 years of age. N Engl J Med. 2002 Jun. 6; 346(23): 1781-5.-   12] Ransohoff D F. Colorectal cancer screening in 2005: status and    challenges. Gastroenterology. 2005 May; 128(6):1685-95-   13] Ahluwalia I B, Mack K A, Murphy W, Mokdad A H, Bales V S.    State-specific prevalence of selected chronic disease-related    characteristics—Behavioral Risk Factor Surveillance System, 2001.    MMWR Surveill Summ. 2003 Aug. 22; 52(8):1-80.-   14] Whitney A R, Diehn M, Popper S J, Alizadeh A A, Boldrick J C,    Relman D A, et al. Individuality and variation in gene expression    patterns in human blood. Proc Natl Acad Sci USA. 2003; 100:1896-901.-   15] Pepe M S. The Statistical Evaluation of Medical Tests for    Classification and Prediction. Oxford, England: Oxford University    Press; 2003.-   16] Dupont W D. Statistical Modeling for Biomedical Researchers.    Cambridge, England: Cambridge University Press; 2002.-   17] Pampel F C. Logistic regression: A Primer. Publication #07-132,    Sage Publications: Thousand Oaks, Calif. 2000.-   18] King E N, Ryan T P. A preliminary investigation of maximum    likelihood logistic regression versus exact logistic regression. Am    Statistician 2002; 56:163-170.-   19] Metz C E. Basic principles of ROC analysis. Semin Nucl Med 1978;    8:283-98.-   20] Swets J A. Measuring the accuracy of diagnostic systems. Science    1988; 240:1285-93.-   21] Zweig M H, Campbell G. Receiver-operating characteristic (ROC)    plots: a fundamental evaluation tool in clinical medicine. Clin Chem    1993; 39:561-77.-   22] Witten I H, Frank Eibe. Data Mining: Practical Machine Learning    Tools and Techniques (second edition). Morgan Kaufman 2005.-   23] Deutsch J M. Evolutionary algorithms for finding optimal gene    sets in microarray prediction. Bioinformatics 2003; 19:45-52.-   24] Citarda F, Tomaselli G, Capocaccia R, Barcherini S, Crespi M;    Italian Multicentre Study Group. Efficacy in standard clinical    practice of colonoscopic polypectomy in reducing colorectal cancer    incidence. Gut. 2001 June; 48(6):812-5.-   25] Niels Landwehr, Mark Hall and Eibe Frank (2003) Logistic Model    Trees. pp 241-252 in Machine Learning: ECML 2003: 14th European    Conference on Machine Learning, Cavtat-Dubrovnik, Croatia, Sep.    22-26, 2003, Proceedings. Publisher: Springer-Verlag GmbH, ISSN:    0302-9743-   26] Tonkin, E. T.; Wang, T.-J.; Lisgo, S.; Bamshad, M. J.;    Strachan, T. NIPBL, encoding a homolog of fungal Scc2-type sister    chromatid cohesion proteins and fly Nipped-B, is mutated in Cornelia    de Lange syndrome. Nature Genet. 36: 636-641, 2004. PubMed ID    15146185.-   27] Krantz, I. D.; McCallum, J.; DeScipio, C.; Kaur, M.; Gillis, L.    A.; Yaeger, D.; Jukofsky, L.; Wasserman, N.; Bottani, A.; Morris, C.    A.; Nowaczyk, M. J. M.; Toriello, H.; and 9 others. Cornelia de    Lange syndrome is caused by mutations in NIPBL, the human homolog of    Drosophila melanogaster Nipped-B. Nature Genet. 36: 631-635, 2004.    PubMed ID 15146186.-   28] Nadel M R, Shapiro J A, Klabunde C N, Seeff L C, Uhler R, Smith    R A, Ransohoff D F. A national survey of primary care physicians'    methods for screening for fecal occult blood. Ann Intern Med. 2005    Jan. 18; 142(2):86-94.-   29] Scaloni A, Jones W, Pospischil M, Sassa S, Schneewind O,    Popowicz A M, Bossa F, Graziano S L, Manning J M. Deficiency of    acylpeptide hydrolase in small-cell lung carcinoma cell lines. J Lab    Clin Med. 1992 October; 120(4):546-52.-   30] Erlandsson R, Boldog F, Persson B, Zabarovsky E R, Allikmets R    L, Sumegi J, Klein G, Jornvall H. The gene from the short arm of    chromosome 3, at D3F15S2, frequently deleted in renal cell    carcinoma, encodes acylpeptide hydrolase. Oncogene. 1991 July;    6(7):1293-5.-   31] Brown M S, Goldstein J L. A proteolytic pathway that controls    the cholesterol content of membranes, cells, and blood. Proc Natl    Acad Sci USA. 1999 Sep. 28; 96(20):11041-8.-   32] Adams J C, Seed B, Lawler J. Muskelin, a novel intracellular    mediator of cell adhesive and cytoskeletal responses to    thrombospondin-1. EMBO J. 1998 Sep. 1; 17(17):4964-74.-   33] Gillis L A, McCallum J, Kaur M, DeScipio C, Yaeger D, Mariani A,    Kline A D, Li H H, Devoto M, Jackson L G, Krantz I D. NIPBL    mutational analysis in 120 individuals with Cornelia de Lange    syndrome and evaluation of genotype-phenotype correlations. Am J Hum    Genet. 2004 October; 75(4):610-23. Epub 2004 Aug. 18.-   34] Takakura S, Kohno T, Manda R, Okamoto A, Tanaka T, Yokota J.    Genetic alterations and expression of the protein phosphatase 1    genes in human cancers. Int J Oncol. 2001 April; 18(4):817-24.-   35] Periale T F, Ransohoff D F, Itzkowitz S H, Turnbull B A, Ross M    E; Colorectal Cancer Study Group. Fecal DNA versus fecal occult    blood for colorectal-cancer screening in an average-risk population.    N Engl J Med. 2004 Dec. 23; 351(26):2704-14.

All patents, patent applications, and published references cited hereinare hereby incorporated by reference in their entirety.

One skilled in the art will readily appreciate that the presentinvention is well adapted to carry out the objects and obtain the endsand advantages mentioned, as well as those objects, ends and advantagesinherent herein. The present examples, along with the methods,procedures, treatments, molecules, and specific compounds describedherein are presently representative of preferred embodiments, areexemplary, and are not intended as limitations on the scope of theinvention. Changes therein and other uses will occur to those skilled inthe art which are encompassed within the spirit of the invention asdefined by the scope of the claims.

While this invention has been particularly shown and described withreferences to preferred embodiments thereof, it will be understood bythose skilled in the art that various changes in form and details may bemade therein without departing from the scope of the inventionencompassed by the appended claims.

Tables

Table 1 is filed herewith via EFS-Web, identified as file table1.txt.

TABLE 2 Gene Symbol AffySpot p-value(MW) p-value (t-test) Fold changeDirection Gene Description APEH 201284_s_at        6.0e−03 0.91downregulated N-acylaminoacyl-pept C1orf22 220342_x_at <0.0013.15049390173162e−03 1.07 upregulated chromosome 1 open re ESR1215551_at 0.058 0.03 0.87 downregulated estrogen receptor 1 ETS1214447_at 0.132 0.15 0.93 downregulated v-ets erythroblastos FLJ14624225666_at 0.01 0.76 downregulated hypothetical protein FLJ20701 0.010.76 downregulated hypothetical protein FLJ23091 221958_s_at 0.05 0.83downregulated putative NFkB activa G2 234784_at 0.182 0.18 0.78downregulated G2 protein ICOS 210439_at 0.07 0.88 downregulatedinducible T-cell co- ITCH 235057_at <0.001 1.73049856446377e−06 1.13upregulated itchy homolog E3 ubi MBTPS1 217543_s_at        8.0e−03 1.21upregulated membrane-bound trans MGC40157 225065_x_at 0.301 0.04 1.19downregulated hypothetical protein MGC45871 226905_at 0.11 1.35upregulated hypothetical protein MKLN1 242984_at 0.003 0.04 1.06upregulated muskelin 1, intracel MMP9 203936_s_at 0.0067.80154116715436e−03 1.24 upregulated matrix metalloprotei NIPBL1560474_at 0.005 0.01 1.07 upregulated Nipped-B homolog (Dr RPS241555878_at 0.05 0.88 downregulated ribosomal protein S2 SMARCA1203874_s_at 0.077 0.06 0.82 downregulated SWI/SNF related, mat

TABLE 3 RNA Gene Symbol Accession No. Protein Accession No. DescriptionFLJ14624 NM_032813 NP_116202 Homo sapiens hypothetical protein FLJ14624(FLJ14624), mRNA FLJ14624 NM_032813 NP_116202 Homo sapiens hypotheticalprotein FLJ14624 (FLJ14624), mRNA ETS1 NM_005238 NP_005229 Homo sapiensv-ets erythroblastosis virus E26 oncogene homolog 1 (avian) (ETS1), mRNARPS24 NM_033022 NP_148982 Homo sapiens ribosomal protein S24 (RPS24),transcript variant 1, mRNA RPS24 NM_001026 NP_001017 Homo sapiensribosomal protein S24 (RPS24), transcript variant 2, mRNA FLJ20701NM_017933 NP_060403 Homo sapiens hypothetical protein FLJ20701(FLJ20701), mRNA MKLN1 NM_013255 NP_037387 Homo sapiens muskelin 1,intracellular mediator containing kelch motifs (MKLN1), mRNA NIPBLNM_133433 NP_597677 Homo sapiens Nipped-B homolog (Drosophila) (NIPBL),transcript variant A, mRNA NIPBL NM_015384 NP_056199 Homo sapiensNipped-B homolog (Drosophila) (NIPBL), transcript variant B, mRNAFLJ20701 NM_017933 NP_060403 Homo sapiens hypothetical protein FLJ20701(FLJ20701), mRNA RPS24 NM_033022 NP_148982 Homo sapiens ribosomalprotein S24 (RPS24), transcript variant 1, mRNA RPS24 NM_001026NP_001017 Homo sapiens ribosomal protein S24 (RPS24), transcript variant2, mRNA APEH NM_001640 NP_001631 Homo sapiens N-acylaminoacyl-peptidehydrolase (APEH), mRNA MBTPS1 NM_003791 NP_003782 Homo sapiensmembrane-bound transcription factor protease, site 1 (MBTPS1), transcripMBTPS1 NM_201268 NP_957720 Homo sapiens membrane-bound transcriptionfactor protease, site 1 (MBTPS1), transcrip SMARCA1 NM_003069 NP_003060Homo sapiens SWI/SNF related, matrix associated, actin dependentregulator of chromat SMARCA1 NM_139035 NP_620604 Homo sapiens SWI/SNFrelated, matrix associated, actin dependent regulator of chromat SMARCA1NM_003069 NP_003060 Homo sapiens SWI/SNF related, matrix associated,actin dependent regulator of chromat SMARCA1 NM_139035 NP_620604 Homosapiens SWI/SNF related, matrix associated, actin dependent regulator ofchromat MMP9 NM_004994 NP_004985 Homo sapiens matrix metalloproteinase 9(gelatinase B, 92 kDa gelatinase, 92 kDa type I MKLN1 NM_013255NP_037387 Homo sapiens muskelin 1, intracellular mediator containingkelch motifs (MKLN1), mRNA ESR1 NM_000125 NP_000116 Homo sapiensestrogen receptor 1 (ESR1), mRNA NIPBL NM_133433 NP_597677 Homo sapiensNipped-B homolog (Drosophila) (NIPBL), transcript variant A, mRNA NIPBLNM_015384 NP_056199 Homo sapiens Nipped-B homolog (Drosophila) (NIPBL),transcript variant B, mRNA ITCH NM_031483 NP_113671 Homo sapiens itchyhomolog E3 ubiquitin protein ligase (mouse) (ITCH), mRNA ITCH NM_031483NP_113671 Homo sapiens itchy homolog E3 ubiquitin protein ligase (mouse)(ITCH), mRNA ICOS NM_012092 NP_036224 Homo sapiens inducible T-cellco-stimulator (ICOS), mRNA ESR1 NM_000125 NP_000116 Homo sapiensestrogen receptor 1 (ESR1), mRNA ESR1 NM_000125 NP_000116 Homo sapiensestrogen receptor 1 (ESR1), mRNA ESR1 NM_000125 NP_000116 Homo sapiensestrogen receptor 1 (ESR1), mRNA ESR1 NM_000125 NP_000116 Homo sapiensestrogen receptor 1 (ESR1), mRNA NIPBL NM_133433 NP_597677 Homo sapiensNipped-B homolog (Drosophila) (NIPBL), transcript variant A, mRNA NIPBLNM_015384 NP_056199 Homo sapiens Nipped-B homolog (Drosophila) (NIPBL),transcript variant B, mRNA NIPBL NM_133433 NP_597677 Homo sapiensNipped-B homolog (Drosophila) (NIPBL), transcript variant A, mRNA NIPBLNM_015384 NP_056199 Homo sapiens Nipped-B homolog (Drosophila) (NIPBL),transcript variant B, mRNA NIPBL NM_133433 NP_597677 Homo sapiensNipped-B homolog (Drosophila) (NIPBL), transcript variant A, mRNA NIPBLNM_015384 NP_056199 Homo sapiens Nipped-B homolog (Drosophila) (NIPBL),transcript variant B, mRNA ETS1 NM_005238 NP_005229 Homo sapiens v-etserythroblastosis virus E26 oncogene homolog 1 (avian) (ETS1), mRNASMARCA1 NM_003069 NP_003060 Homo sapiens SWI/SNF related, matrixassociated, actin dependent regulator of chromat SMARCA1 NM_139035NP_620604 Homo sapiens SWI/SNF related, matrix associated, actindependent regulator of chromat ESR1 NM_000125 NP_000116 Homo sapiensestrogen receptor 1 (ESR1), mRNA ESR1 NM_000125 NP_000116 Homo sapiensestrogen receptor 1 (ESR1), mRNA ITCH NM_031483 NP_113671 Homo sapiensitchy homolog E3 ubiquitin protein ligase (mouse) (ITCH), mRNA ESR1NM_000125 NP_000116 Homo sapiens estrogen receptor 1 (ESR1), mRNA ESR1NM_000125 NP_000116 Homo sapiens estrogen receptor 1 (ESR1), mRNA MBTPS1NM_003791 NP_003782 Homo sapiens membrane-bound transcription factorprotease, site 1 (MBTPS1), transcrip MBTPS1 NM_201268 NP_957720 Homosapiens membrane-bound transcription factor protease, site 1 (MBTPS1),transcrip FLJ20701 NM_017933 NP_060403 Homo sapiens hypothetical proteinFLJ20701 (FLJ20701), mRNA C1orf22 NM_025191 NP_079467 Homo sapienschromosome 1 open reading frame 22 (C1orf22), mRNA C1orf22 NM_025191NP_079467 Homo sapiens chromosome 1 open reading frame 22 (C1orf22),mRNA FLJ23091 NM_024911 NP_079187 Homo sapiens putative NFkB activatingprotein 373 (FLJ23091), transcript variant 1, m FLJ23091 NM_001002292NP_001002292 Homo sapiens putative NFkB activating protein 373(FLJ23091), transcript variant 2, m C1orf22 NM_025191 NP_079467 Homosapiens chromosome 1 open reading frame 22 (C1orf22), mRNA ETS1NM_005238 NP_005229 Homo sapiens v-ets erythroblastosis virus E26oncogene homolog 1 (avian) (ETS1), mRNA MGC40157 NM_152350 NP_689563Homo sapiens hypothetical protein MGC40157 (MGC40157), mRNA MKLN1NM_013255 NP_037387 Homo sapiens muskelin 1, intracellular mediatorcontaining kelch motifs (MKLN1), mRNA FLJ14624 NM_032813 NP_116202 Homosapiens hypothetical protein FLJ14624 (FLJ14624), mRNA MGC45871NM_182705 NP_874364 Homo sapiens hypothetical protein MGC45871(MGC45871), mRNA MGC45871 NM_182705 NP_874364 Homo sapiens hypotheticalprotein MGC45871 (MGC45871), mRNA FLJ23091 NM_024911 NP_079187 Homosapiens putative NFkB activating protein 373 (FLJ23091), transcriptvariant 1, m FLJ23091 NM_001002292 NP_001002292 Homo sapiens putativeNFkB activating protein 373 (FLJ23091), transcript variant 2, m FLJ23091NM_024911 NP_079187 Homo sapiens putative NFkB activating protein 373(FLJ23091), transcript variant 1, m FLJ23091 NM_001002292 NP_001002292Homo sapiens putative NFkB activating protein 373 (FLJ23091), transcriptvariant 2, m MKLN1 NM_013255 NP_037387 Homo sapiens muskelin 1,intracellular mediator containing kelch motifs (MKLN1), mRNA ESR1NM_000125 NP_000116 Homo sapiens estrogen receptor 1 (ESR1), mRNA ITCHNM_031483 NP_113671 Homo sapiens itchy homolog E3 ubiquitin proteinligase (mouse) (ITCH), mRNA MKLN1 NM_013255 NP_037387 Homo sapiensmuskelin 1, intracellular mediator containing kelch motifs (MKLN1), mRNAFLJ20701 NM_017933 NP_060403 Homo sapiens hypothetical protein FLJ20701(FLJ20701), mRNA FLJ20701 NM_017933 NP_060403 Homo sapiens hypotheticalprotein FLJ20701 (FLJ20701), mRNA ITCH NM_031483 NP_113671 Homo sapiensitchy homolog E3 ubiquitin protein ligase (mouse) (ITCH), mRNA NIPBLNM_133433 NP_597677 Homo sapiens Nipped-B homolog (Drosophila) (NIPBL),transcript variant A, mRNA NIPBL NM_015384 NP_056199 Homo sapiensNipped-B homolog (Drosophila) (NIPBL), transcript variant B, mRNA MKLN1NM_013255 NP_037387 Homo sapiens muskelin 1, intracellular mediatorcontaining kelch motifs (MKLN1), mRNA MKLN1 NM_013255 NP_037387 Homosapiens muskelin 1, intracellular mediator containing kelch motifs(MKLN1), mRNA

TABLE 4 SEQ SEQ SEQ Gene ID ID ID Symbol SensePrimer NO AntisensePrimerNO TaqManProbe NO APEH GCCCTGTATTATGTGGACCT  54 AGATGGGTACTGCAGGTAGA  55CCGGCTGAGCCCAGACCAAT  56 APEH CACTCGGAGACACACTTGTT  57CTTGGTCTGGCTTCTTCAG  58 CATCGCTGGCACTGACGTCCA  59 APEHACTCGGAGACACACTTGTTG  60 CTTGGTCTGGCTTCTTCAG  61 CATCGCTGGCACTGACGTCCA 62 APEH AGTGGTGGTAGATGTTGTGC  63 AGACCACTCTCTGGCTGTC  64TGCAGCCTTCTGCCTTTGGGA  65 APEH CACTTGTTGTATGTGGCAGA  66GCCTGGCTATCTCATCATC  67 C1orf22 TAGGGAGGAGAAACAGAAGC  68GGCCTCTAACTCGACCTCTA  69 TGGAACATGCTTACCCTGCTGATGA  70 C1orf22TGTGGTGGATAAGAGCTGTC  71 CCCATCTTCTTCAGGATTTC  72TGGCCATGAAATCTCTGGCTCTCA  73 C1orf22 GAGGAGAGTTTCAGGAGTGG  74CTCCCATCTTTGAGGTGAAT  75 TGGCCATGAAATCTCTGGCTCTCA  76 C1orf22TTGCTTGGAGATGACAGTTT  77 AGCATTGGTTTGTGGATATG  78 ESR1GGCACATCTTCTGTCTTCTG  79 CTGTGAAGAGCTACGGGAAT  80 TGGAATCCCTTTGGCTGTTCCC 81 ESR1 TGGCACATCTTCTGTCTTCT  82 CTGTGAAGAGCTACGGGAAT  83TGGAATCCCTTTGGCTGTTCCC  84 ESR1 CCCTACTACCTGGAGAACGA  85ATTGGTACTGGCCAATCTTT  86 CTGCCACCCTGGCGTCGATT  87 ESR1CACCATTCCCAAGTTAATCC  88 GAAATGCAGTTGGAAACAGA  89TGGGACCAAAGTTCATTTGCTCCA  90 ESR1 TGCCCTACTACCTGGAGAAC  91ATTGGTACTGGCCAATCTTT  92 AGCCCAGCGGCTACACGGTG  93 ESR1GTGCCCTACTACCTGGAGAA  94 ATTGGTACTGGCCAATCTTT  95 AGCCCAGCGGCTACACGGTG 96 ESR1 TGTGCAATGACTATGCTTCA  97 TTATCAATGGTGCACTGGTT  98 ESR1TGTGCAATGACTATGCTTCA  99 TTTATCAATGGTGCACTGGT 100 ETS1TGGGTGGTTTATACACTGGA 101 ATAAGGGTTTCACCCAGCTA 102CCAGATTTGCCCATCCTTCCTCTG 103 ETS1 GAGCTCCTCTCCCTCAAGTA 104GGTGACGACTTCTTGTTTGA 105 CCTCGGTCATTCTCCGAGACCC 106 ETS1GGATGTCAGGTGAGACTGTG 107 GCCTTCAAGTCATTCCTCTC 108 CCCTGGCATCACCTGTGCCA109 ETS1 TCAAACAAGAAGTCGTCACC 110 GCGATCACAACTATCGTAGC 111CCCGAGTTTACCACGACTGGTCCTC 112 ETS1 CCGCTATACCTCGGATTACT 113GCGTCTGATAGGACTCTGTG 114 CCCAGTGTGTTCCACCATCGGA 115 FLJ14624ACAGCTGCCATCAGAAACTA 116 GCTTCCTGTAGCTCATTCCT 117CCGGGAAGCTGTAAGATTAAATCCCAA 118 FLJ14624 TGATAAAGGCAACCAGACAG 119AGCTTCCTGTAGCTCATTCC 120 TGCCATCAGAAACTACCGGGAAGC 121 FLJ14624GGATTTCTCGTTATCCCATT 122 TCTTGGTATGTTTGCTCAGG 123 AGGACACGCTCCGCGACCAC124 FLJ14624 CAAATACAGCCAGACTTTGC 125 TTTAATTGCTGTCCGGTAAC 126TGCTGCTTCAAACCGTTTCAGGC 127 FLJ14624 CAAATACAGCCAGACTTTGC 128GTTTAATTGCTGTCCGGTAA 129 TGCTGCTTCAAACCGTTTCAGGC 130 FLJ14624CTTGGTGATAGGCAAATTCA 131 AGCAGGGTCATTCTGAAGAG 132 CCCGTTCCTGAGCATGCCGA133 FLJ14624 TTTGGCTGTGCTTTATCATC 134 CAGCAGACCGTAATTCTCCT 135TGTTTCTTGGCCAAGTCTAGATGTCCC 136 FLJ20701 ATTGGAGGACAAGAGCAGAT 137CCATCGCTCTCTAGATTGG 138 TTCTTAGGTGCCGCAGTGCCC 139 FLJ20701CCTTCAGGAAGACTTTCCAC 140 CTCAAGTTCATTCAGCCATC 141 ACGGGCGGATCCACAGCAAC142 FLJ20701 TGATCAGCAAGTGAACACAC 143 CTCGGTGATAGCAAATCAGA 144TGCATCCTTGATGGCAAGCTTCA 145 FLJ20701 GTGATCAGCAAGTGAACACA 146CTCGGTGATAGCAAATCAGA 147 TGCATCCTTGATGGCAAGCTTCA 148 FLJ20701TTAATGGTCCTGTCTGATGC 149 GGGTCTCTAGACAAGCCAAG 150 FLJ20701AGATTTGCAGACACAGAAGC 151 TTCTAACATCAGGTGGTTGC 152 FLJ23091TACAACTCACGAATCCCTTCT 153 ACAGGAAGTAGAGGCAGAGG 154 FLJ23091CAACTCACGAATCCCTTCTAC 155 ACAGGAAGTAGAGGCAGAGG 156 FLJ23091ACAACTCACGAATCCCTTCTA 157 ACAGGAAGTAGAGGCAGAGG 158 FLJ23091AACTCACGAATCCCTTCTACA 159 ACAGGAAGTAGAGGCAGAGG 160 FLJ23091GGGATTTCCATGACCTTTAT 161 ATCCAGAAGGACAGAAGCAT 162 TGCCCTGTCGGATGTCACCA163 FLJ23091 ATTGAAGAGGCAATTCCAAG 164 TTTAGCTTGAAGGCAATGTC 165CCACATGGAGATGAGTCCTTGGTTCC 166 ICOS CATGTGTAATGCTGGATGTG 167AAACAACTCAGGGAACACCT 168 TGGACAACCTGACTGGCTTTGCA 169 ICOSCAGGCCTCTGGTATTTCTTT 170 ATTTGTACACCTCCGTTGTG 171TTGGCAGAACCATTGATTTCTCCTGTT 172 ICOS AAACATGAAGTCAGGCCTCT 173GTACACCTCCGTTGTGAAAT 174 TTGGCAGAACCATTGATTTCTCCTGTT 175 ICOSGTCAGGCCTCTGGTATTTCT 176 ATTTGTACACCTCCGTTGTG 177TTGGCAGAACCATTGATTTCTCCTGTT 178 ICOS GAAGTCAGGCCTCTGGTATT 179ATTTGTACACCTCCGTTGTG 180 TTGGCAGAACCATTGATTTCTCCTGTT 181 ITCHCAGATCCAAGGATGAAACAA 182 ACCACCATTTGAGAGTGATG 183CACCAGCTCCTGCATCTTCAGGG 184 ITCH TCAATCCAGATCACCTGAAA 185AATCCTTGAGTCCAACTGGT 186 CCCATGGAACAGAGCCATGGC 187 ITCHAAGCTGTTGTTTGCCATAGA 188 CAGAGAAGGACAAACATTGC 189TGCCCATTCATGGTGCAAGTTCTC 190 ITCH AAGCTGTTGTTTGCCATAGA 191GCAGAGAAGGACAAACATTG 192 TGCCCATTCATGGTGCAAGTTCTC 193 ITCHGCTGTTGTTTGCCATAGAAG 194 CAGAGAAGGACAAACATTGC 195TGCCCATTCATGGTGCAAGTTCTC 196 ITCH GCTGTTGTTTGCCATAGAAG 197GCAGAGAAGGACAAACATTG 198 TGCCCATTCATGGTGCAAGTTCTC 199 MBTPS1CAATGACGGACCTCTTTATG 200 GGTAGCTCCCAGGTAGTCAT 201TGCCGCCTACTCCAATCACATCC 202 MBTPS1 TCTGTGGGAAGAAACATCTG 203TGATGAGAATTCCACCTTCA 204 MBTPS1 AATCCATCCAGTGACTACCC 205ACTTGAGGGAACGAAAGACT 206 AACATCAAACGGGTCACGCCC 207 MBTPS1CAGCCAAAGCTAGAAATTCA 208 TAGGGTAGTCACTGGATGGA 209TTCACTGCTCTTCAGGGCACTTGAA 210 MBTPS1 GCAATGACGGACCTCTTTAT 211GGTAGCTCCCAGGTAGTCAT 212 TGCCGCCTACTCCAATCACATCC 213 MGC40157 AGAATCAGCATCATGTTTGG 214 ATAACCTTCTCTTGGGCTGA 215 CCTCATGGCAGGCTCCTGGC216 MGC40157  TCAGCCCAAGAGAAGGTTAT 217 TGAGCATGTCCTCTGATACA 218TCCCAAGGACCAGTAGCTGCCA 219 MGC40157  GACTACAGCTCACAGCACAC 220AAAGCTACAACTTGGCCTGT 221 TGCCCAGGCTGGTCTCAGGC 222 MGC40157 TCAGCCCAAGAGAAGGTTAT 223 AGGCAAGCATGTTTCTACAC 224 TCCCAAGGACCAGTAGCTGCCA225 MGC40157  ACTACAGCTCACAGCACACC 226 AAAGCTACAACTTGGCCTGT 227TGCCCAGGCTGGTCTCAGGC 228 MGC40157  TCAGCCCAAGAGAAGGTTAT 229CATAAGGCAAGCATGTTTCT 230 TCCCAAGGACCAGTAGCTGCCA 231 MGC40157 AGGCTCATGGATCACTCTTT 232 GGTACGCAATCCAGTTCTCT 233 CCGGCCTTCGCAGACTCCAG234 MCG45871  GACCTCTCTGATGAATGCTG 235 AATGACGTGAAGGGTAAGGT 236ACCGGCTCTCCCGCTGTCCT 237 MCG45872  GACCTCTCTGATGAATGCTG 238GAATGACGTGAAGGGTAAGG 239 ACCGGCTCTCCCGCTGTCCT 240 MCG45873 GACCTCTCTGATGAATGCTG 241 GAATGACGTGAAGGGTAAGGT  242 ACCGGCTCTCCCGCTGTCCT243 MCG45874  GACCTCTCTGATGAATGCTG 244 GGAATGACGTGAAGGGTAAG 245ACCGGCTCTCCCGCTGTCCT 246 MCG45875  CCTCTCTGATGAATGCTGAC 247GAATGACGTGAAGGGTAAGG 248 ACCGGCTCTCCCGCTGTCCT 249 MCG45876 ACCTCTCTGATGAATGCTGA 250 GAATGACGTGAAGGGTAAGG 251 ACCGGCTCTCCCGCTGTCCT252 MKLN1 CCAGTGAACCACAATTCAGT 253 ATGCAGTGTCCTATTCGAGA 254TGGATGTCCTCAGGCCCAGCA 255 MKLN1 GGATCACACCTATGCTCAAA 256CCATTTCTGTGTCCAGTGAC 257 TGACAGCATGACTCCTCCTAAAGGCA 258 MKLN1ATTTGAGAGAGGAGGCTGAG 259 AATGAAATGCCTGTCAGTTG 260CCACTGGACAACCACAAACCATTTCTC 261 MKLN1 GACTTGTAATGGCAGCGTAG 262 CTCGAAGAAGTTTCCAGGTT 263 TGACAGCAGAGCCAGTGAACCACA 264 MKLN1ACTTGTAATGGCAGCGTAGA 265 CTCGAAGAAGTTTCCAGGTT 266TGACAGCAGAGCCAGTGAACCACA 267 MMP9 ACTTTGACAGCGACAAGAAGT 268GCGGTACATAGGGTACATGA 269 CGCCGCCACGAGGAACAAAC 270 MMP9AACTTTGACAGCGACAAGAA 271 GAAGCGGTACATAGGGTACAT  272 CGCCGCCACGAGGAACAAAC273 MMP9 CAGTACCACGGCCAACTAC 274 TGGAAGATGAATGGAAACTG 275CCCATCAGCATTGCCGTCCC 276 MMP9 CCACTACTGTGCCTTTGAGT 277GTACTTCCCATCCTTGAACA 278 TTCCCAATCTCCGCGATGGC 279 MMP9CCACTACTGTGCCTTTGAGT 280 CTTCCCATCCTTGAACAAA 281 TTCCCAATCTCCGCGATGGC282 NIPBL CACGAATAGCAGAAGAGGTG 283 GTATGTCACCTTCTGGGTCA 284CAGCTTGTCCATAGCCTCAACCAGG 285 NIPBL CCCATCCTTCAAGTTACACA 286ATTAGCTGAATTGCCAGACA 287 TCCACAGATGCAACAAGCATCGG 288 NIPBLGGGATTGCTAGTCTCACAGA 289 TCTTCTGCTATTCGTGCATT 290TCCTGAACCAGCTGCCTCTTCCA 291 NIPBL CCCATCCTTCAAGTTACACA 292AGACAACGGACCAGAAACTT 293 TCCACAGATGCAACAAGCATCGG 294 NIPBLGCACAGGCTAAGTAGTGACG 295 AGGTGGTCTTGAGCCTTTAG 296TTTCCCTTGAGATCTCCACAGCCA 297 NIPBL CACGAATAGCAGAAGAGGTG 298GTATGTCACCTTCTGGGTCA 299 CAGCTTGTCCATAGCCTCAACCAGG 300 NIPBLCCCATCCTTCAAGTTACACA 301 ATTAGCTGAATTGCCAGACA 302TCCACAGATGCAACAAGCATCGG 303 NIPBL GGGATTGCTAGTCTCACAGA 304TCTTCTGCTATTCGTGCATT 305 TCCTGAACCAGCTGCCTCTTCCA 306 NIPBLCCCATCCTTCAAGTTACACA 307 AGACAACGGACCAGAAACTT 308TCCACAGATGCAACAAGCATCGG 309 NIPBL GCACAGGCTAAGTAGTGACG 310AGGTGGTCTTGAGCCTTTAG 311 TTTCCCTTGAGATCTCCACAGCCA 312 RPS24CACCGTAACTATCCGCACTA 313 CGGTGTGGTCTTGTACATTT 314 AGGCACTGTCGCCTTCCCGG315 RPS24 AAGAAAGCAACGAAAGGAAC 316 TCATTGCAGCACCTTTACTC 317TGCCAGCACCAACATTGGCC 318 RPS24 AGACATGGCCTGTATGAGAA 319ATCCAATCTCCAGCTCACTT 320 TGCCAGCACCAACATTGGCC 321 RPS24AAGACATGGCCTGTATGAGA 322 ATCCAATCTCCAGCTCACTT 323 TGCCAGCACCAACATTGGCC324 RPS24 GACATGGCCTGTATGAGAAG 325 ATCCAATCTCCAGCTCACTT 326TGCCAGCACCAACATTGGCC 327 RPS24 AAGGAACGCAAGAACAGAAT 328ATTGCAGCACCTTTACTCCT 329 TGCCAGCACCAACATTGGCC 330 SMARCA1TACCTGGTCATTGATGAAGC 331 CAAAGGTGTTCCAGTTAGGA 332CGTGAGTTCAAGTCGACTAACCGCTTG 333 SMARCA1 TACCTGGTCATTGATGAAGC 334TTATTCTGCAAAGGTGTTCC 335 TTCAAGTCGACTAACCGCTTGCTCC 336 SMARCA1TGGACCCAGAATATGAAGAG 337 ATGTTCAGTGGAGATGTTGG 338TCTCTTTGCTCGGTCGGCTTTCA 339 SMARCA1 CTTTGCTTGGTTACCTGAAA 340AAACAAATGACACGGAGAGA 341 CCGAAATATTCCTGGACCTCACATGG  342 SMARCA1GGAAATGGACCCAGAATATG 343 GTTGGAGATTTCTGTGCTGA 344TCTCTTTGCTCGGTCGGCTTTCA 345 SMARCA1 AAGGAAATGGACCCAGAATA 346CTGAAGGCTGAATGAAATGT 347 TCTCTTTGCTCGGTCGGCTTTCA 348 SMARCA1AGTGGGATGTTTGCGTTACT 349 CACGAACAATCTCTGAAAGC 350TCATCAATGACCAGGTATCGCCAGTG 351

TABLE 5 Commercially Available Antibody Gene Symbol DescriptionReference Scientific Reference APEH N- acylaminoacyl- peptide hydrolaseC1orf22 chromosome 1 open reading frame 22 ESR1 estrogen receptor 1Ab2746 Shevde NK & Pike JW Estrogen modulates (Abcam) the recruitment ofmyelopoietic cell progenitors in rat through a stromal cell- independentmechanism involving apoptosis. Blood 87: 2683-92 (1996). Yang NN et al.Identification of an estrogen response element activated by metabolitesof 17beta-estradiol and raloxifene. Science 273: 1222-5 (1996). ETS1v-ets Ab10936 Pande P et al. Ets-1: a plausible marker oferythroblastosis (Abcam) invasive potential and lymph node metastasisvirus E26 in human oral squamous cell carcinomas. J oncogene Pathol 189:40-5 (1999). homolog 1 Nakayama T et al. Expression of the Ets-1 (avian)proto-oncogene in human gastric carcinoma: correlation with tumorinvasion. Am J Pathol 149: 1931-9 (1996). Bories JC et al. IncreasedT-cell apoptosis and terminal B-cell differentiation induced byinactivation of the Ets-1 proto-oncogene. Nature 377: 635-8 (1995).Wernert N et al. Stromal expression of c- Ets1 transcription factorcorrelates with tumor invasion. Cancer Res 54: 5683-8 (1994). FLJ14624FLJ20701 putative NFkB activating protein 373 FLJ23091 inducible T-cellco-stimulator G2 G2 protein ICOS inducible T-cell Ab3744 co-stimulator(Abcam) ITCH itchy homolog E3 ubiquitin protein ligase (mouse) MBTPS1membrane-bound transcription factor protease, site 1 MGC40157hypothetical protein MGC40157 MGC45871 hypothetical protein MGC45871MKLN1 muskelin 1, intracellular mediator containing kelch motifs MMP9matrix Ab5707 metalloproteinase 9 (gelatinase B, 92 kDa gelatinase, 92kDa type IV collagenase) NIPBL Nipped-B homolog (Drosophila) RPS24ribosomal protein S24 SMARCA1 SWI/SNF related, Ab21924 Lazzaro MA &Picketts DJ Cloning and matrix (Abcam) characterization of the murineImitation associated, actin Switch (ISWI) genes: differential expressiondependent patterns suggest distinct developmental roles regulator of forSnf2h and Snf2l. J Neurochem 77: 1145-56 chromatin, (2001). subfamily a,member 1

TABLE 6 5′ Primer 3′ Primer SEQ SEQ ID ID Product Symbol Ref. IDPrimer Sequence Position NO: Primer Sequence Position NO: Length APEHNM_001640 AAGGATGTCCAGTTTGCAGTGG 2081 878 TGGCAGGAAATGAAGCCACCAT 2184886 104 FLJ23091 NM_024911 ACAGGCATCTATGGGATGTGGA 1694 879AGATCGCCATTGGACTGGTCTT 1794 887 101 MBTPS1 NM_003791TCGGTACTCCAAGGTTCTGGA 3220 880 TGTTTCCAAAGGTTACTGGGCG 3348 888 129MGC40157 NM_152350 ACCCGAGAGAACTGGATTGCGT  359 881GCTCCAATACTCAGCTGCCAAA  469 889 111 MGC45871 NM_182705CCACCAAAGGAAGTAAGGTACAC  362 882 TAGTTGCGCCACGTGCCATT  507 890 146 MKLN1NM_013255 GAGGGCCGAAATTGGTGTTTGA 1169 883 ATTGTGGTTCACTGGCTCTGCT 1297891 129 NIPBL NM_015384 AGTGTACGCCACTTTGCCCTAA 6886 884ATCAGCCTTGTTCCGCATAGCA 7017 892 132 PPP1R2 NM_006241AAGATGCCTGTAGTGACACCGA  250 885 ATCCGATACTTTGGCTCCAAGC  349 893 100

TABLE 7 AJ36h Blind test No Polyp Polyp No Polyp Polyp Sample Size 11068 40 40 Gender (F/M) 55/54* 22/45* 21/19 13/27 Age mean 57 (23~83) 57(38~82) 57 (40~79) 60 (38~76) (range) Polyp Subtype Tubular 21 (31%) 14(35%) adenoma Hyperplastic 18 (27%) 15 (38%) High risk  7 (10%)  3(7.5%) pathology Others 22 (32%)  8 (20%) *one sample missinginformation

TABLE 8 # ROC MGC45871- MKLN1- MGC45871- MKLN1- NIPBL- Equ # ratios areaConstant APEH APEH FLJ23091 FLJ23091 FLJ23091 26692 5 0.718 −2.483 00.173 0 −0.291 0 26690 5 0.718 −2.483 0.173 0 0 −0.291 0 26658 5 0.718−2.483 0.173 0 −0.291 0 0 26660 5 0.718 −2.483 0 0.173 −0.291 0 0 257325 0.718 −3.284 0 0.441 0 0 −0.230 25218 5 0.718 −3.285 0.441 0 0 0−0.230 25730 5 0.718 −3.284 0.441 0 0 0 −0.230 25220 5 0.718 −3.285 00.441 0 0 −0.230 17030 5 0.718 −3.285 −0.279 0.720 0 0 −0.230 18052 50.718 −3.284 0 0.441 0 0 −0.230 MGC45871- MKLN1- NIPBL- MGC45871- MKLN1-Equ # MGC40157 MGC40157 MGC40157 PPP1R2 PPP1R2 Thresh 26692 0 0 −0.173−0.863 −0.929 −0.481 26690 0 0 −0.173 −1.036 −0.756 −0.481 26658 0 0−0.173 −0.745 −1.047 −0.481 26660 0 0 −0.173 −0.572 −1.220 −0.481 257320 −0.514 0 −0.793 −0.916 −0.014 25218 −0.514 0 0 −0.720 −0.989 −0.01425730 0 −0.514 0 −1.234 −0.475 −0.014 25220 −0.514 0 0 −0.279 −1.430−0.014 17030 −0.514 0 0 0 −1.709 −0.014 18052 −0.793 0.279 0 0 −1.709−0.014

TABLE 9 Parameter Blind Number of Samples 80 Number of Equations 10Sensitivity (TPF) 43% (17/40) Specificity (TNF) 80% (32/40) Overallaccuracy 61% (49/80)

TABLE 10 Reporter Gene Protein Activity & Measurement CAT(chloramphenicol Transfers radioactive acetyl groups toacetyltransferase) chloramphenicol or detection by thin layerchromatography and autoradiography GAL (beta-galactosidase) Hydrolyzescolorless galactosides to yield colored products. GUS (beta- Hydrolyzescolorless glucuronides to yield glucuronidase) colored products. LUC(luciferase) Oxidizes luciferin, emitting photons GFP (green fluorescentFluorescent protein without substrate protein) SEAP (secreted alkalineLuminescence reaction with suitable substrates phosphatase) or withsubstrates that generate chromophores HRP (horseradish In the presenceof hydrogen oxide, oxidation of peroxidase)3,3′,5,5′-tetramethylbenzidine to form a colored complex AP (alkalineLuminescence reaction with suitable substrates phosphatase) or withsubstrates that generate chromophoresTable 11 is filed herewith via EFS-Web, identified as file table11.txt.

TABLE 12 GENE AffySpotID Gene ID p value CCa/Ctrl directionDEFAULTGENEDESCRIPTION 219073_s_at OSBPL10 114884  3.8499E−080.451209844 downregulated oxysterol binding protein-like 10 1563498_s_atLOC283130 283130  8.1224E−06 0.628849872 downregulated hypotheticalprotein LOC283130 219667_s_at BANK1 55024  1.0668E−10 0.430259207downregulated B-cell scaffold protein with ankyrin repeats 1 205547_s_atTAGLN 6876 0.40991641 0.889953014 downregulated transgelin 203642_s_atCOBLL1 22837  5.3941E−07 0.474077388 downregulated COBL-like 1 228551_atMGC24039 160518 0.05283113 0.695785002 downregulated hypotheticalprotein MGC24039 235866_at C9orf85 138241  4.2656E−06 0.649259289downregulated chromosome 9 open reading frame 85 207655_s_at BLNK 29760 4.7146E−09 0.508654531 downregulated B-cell linker 230983_at BCNP1199786  1.8615E−11 0.422871727 downregulated B-cell novel protein 1208591_s_at PDE3B 5140 0.05166031 0.826542405 downregulatedphosphodiesterase 3B, cGMP- inhibited 201340_s_at ENC1 8507 0.629950390.93326798 downregulated ectodermal-neural cortex (with BTB-like domain)208325_s_at AKAP13 11214 0.00033172 0.80202517 downregulated A kinase(PRKA) anchor protein 13 233559_s_at WDFY1 57590 0.00920709 0.845905759downregulated WD repeat and FYVE domain containing 1 205627_at CDA 978 9.2103E−05 1.343547864 upregulated cytidine deaminase 1555736_a_atAGTRAP 57085 0.09637521 0.906895763 downregulated angiotensin IIreceptor- associated protein 1554390_s_at ACTR2 10097 0.033062750.892740407 downregulated ARP2 actin-related protein 2 homolog (yeast)201554_x_at GYG 2992 0.51537492 1.055438457 upregulated glycogenin 11559051_s_at C6orf150 115004 0.97824175 1.002133453 upregulatedchromosome 6 open reading frame 150 207795_s_at KLRD1 3824 0.199048850.861882517 downregulated killer cell lectin-like receptor subfamily D,member 1 220784_s_at UTS2 10911 0.04243446 0.627549503 downregulatedurotensin 2 217418_x_at MS4A1 931  1.3139E−09 0.423617526 downregulatedmembrane-spanning 4-domains, subfamily A, member 1 1563674_at SPAP179368  9.3602E−09 0.400736546 downregulated Fc receptor-like 21570259_at LIMS1 3987 0.92270088 0.990561902 downregulated LIM andsenescent cell antigen- like domains 1 228988_at ZNF6 7552 0.265693780.691253833 downregulated zinc finger protein 6 (CMPX1) 215314_at ANK3288 0.03458547 0.655571614 downregulated ankyrin 3, node of Ranvier(ankyrin G) 232911_at KIAA1559 57677  2.7111E−07 0.640258188downregulated mouse zinc finger protein 14-like 200661_at PPGB 54760.6034369  1.039669304 upregulated protective protein for beta-galactosidase (galactosialidosis) 238581_at GBP5 115362 0.032638351.323760479 upregulated guanylate binding protein 5 212859_x_at MT1E4493 0.46242859 0.892029229 downregulated metallothionein 1E(functional) 229893_at MGC20553 257019 0.02322844 1.294641476upregulated FERM domain containing 3 (FRMD3) 204415_at G1P3 25370.98563356 1.003662309 upregulated interferon, alpha-inducible protein(clone IFI-6-16) 211889_x_at CEACAM1 634 0.12181619 1.155086057upregulated carcinoembryonic antigen- related cell adhesion molecule 1(biliary glycoprotein) 1558733_at FLJ35036 253461 0.9989125  1.000149237¾ zinc finger and BTB domain containing 38 235885_at P2RY12 648050.64527963 1.061974664 upregulated purinergic receptor P2Y, G- proteincoupled, 12 208180_s_at HIST1H4H 8365 1.275E−05  0.673893846downregulated histone 1, H4h 204627_s_at ITGB3 3690 0.278563911.238694854 upregulated integrin, beta 3 (platelet glycoprotein IIIa,antigen CD61) 1554676_at PRG1 5552 0.14822954 1.10791216 upregulatedproteoglycan 1, secretory granule 208685_x_at BRD2 6046 0.014111770.840822417 downregulated bromodomain containing 2 219922_s_at LTBP34054 0.00028011 0.553407885 downregulated latent transforming growthfactor beta binding protein 3 218311_at MAP4K3 8491  5.0346E−060.680583549 downregulated mitogen-activated protein kinase kinase kinasekinase 3 212129_at NIPA2 81614 0.00333649 0.8287373 downregulated nonimprinted in Prader- Willi/Angelman syndrome 2 229039_at SYN2 68540.3194711  1.139160086 upregulated synapsin II 203645_s_at CD163 93320.76    0.98 downregulated CD163 molecule 221557_s_at LEF1 51176 0.28   0.87 downregulated lymphoid enhancer-binding factor 1 1552772_at CLEC4D362432 0.01    1.31 upregulated C-type lectin domain family 4, member d220001_at PADI4 23569 0.06    1.18 upregulated peptidyl argininedeiminase, type IV 206026_s_a TNFAIP6 460097 0.03    1.29 upregulatedtumor necrosis factor, alpha- induced protein 6 202431_s_at MYC 46090.06    0.84 downregulated v-myc myelocytomatosis viral oncogene homolog(avian)

TABLE 13 RNA AffySpotID GeneSymbolCG Sequence RNA Acc'n Protein Acc'n1554390_s_at ACTR2 SEQ ID NM_001005386 NP_001005386 NO 1 1554390_s_atACTR2 SEQ ID NM_005722 NP_005713 NO 2 1555736_a_at AGTRAP SEQ IDNM_020350 NP_065083 NO 3 1559051_s_at C6orf150 SEQ ID NM_138441NP_612450 NO 4 1563674_at SPAP1 SEQ ID NM_030764 NP_110391 NO 5200661_at PPGB SEQ ID NM_000308 NP_000299 NO 6 201340_s_at ENC1 SEQ IDNM_003633 NP_003624 NO 7 201554_x_at GYG SEQ ID NM_004130 NP_004121 NO 8203642_s_at COBLL1 SEQ ID NM_014900 NP_055715 NO 9 203645_s_at CD163 SEQID NM_004244 NP_004235 NO 10 203645_s_at CD163 SEQ ID NM_203416NP_981961 NO 11 204415_at G1P3 SEQ ID NM_002038 NP_002029 NO 12204415_at G1P3 SEQ ID NM_022872 NP_075010 NO 13 204415_at G1P3 SEQ IDNM_022873 NP_075011 NO 14 204627_s_at ITGB3 SEQ ID NM_000212 NP_000203NO 15 205547_s_at TAGLN SEQ ID NM_001001522 NP_001001522 NO 16205547_s_at TAGLN SEQ ID NM_003186 NP_003177 NO 17 205627_at CDA SEQ IDNM_001785 NP_001776 NO 18 207655_s_at BLNK SEQ ID NM_013314 NP_037446 NO19 207795_s_at KLRD1 SEQ ID NM_002262 NP_002253 NO 20 207795_s_at KLRD1SEQ ID NM_007334 NP_031360 NO 21 208180_s_at HIST1H4H SEQ ID NM_003543NP_003534 NO 22 208325_s_at AKAP13 SEQ ID NM_006738 NP_006729 NO 23208325_s_at AKAP13 SEQ ID NM_007200 NP_009131 NO 24 208325_s_at AKAP13SEQ ID NM_144767 NP_658913 NO 25 208591_s_at PDE3B SEQ ID NM_000922NP_000913 NO 26 208685_x_at BRD2 SEQ ID NM_005104 NP_005095 NO 27211889_x_at CEACAM1 SEQ ID NM_001712 NP_001703 NO 28 212129_at NIPA2 SEQID NM_001008860 NP_001008860 NO 29 212129_at NIPA2 SEQ ID NM_001008892NP_001008892 NO 30 212129_at NIPA2 SEQ ID NM_001008894 NP_001008894 NO31 212129_at NIPA2 SEQ ID NM_030922 NP_112184 NO 32 217418_x_at MS4A1SEQ ID NM_021950 NP_068769 NO 33 217418_x_at MS4A1 SEQ ID NM_152866NP_690605 NO 34 218311_at MAP4K3 SEQ ID NM_003618 NP_003609 NO 35219073_s_at OSBPL10 SEQ ID NM_017784 NP_060254 NO 36 219667_s_at BANK1SEQ ID NM_017935 NP_060405 NO 37 219922_s_at LTBP3 SEQ ID NM_021070NP_066548 NO 38 220784_s_at UTS2 SEQ ID NM_006786 NP_006777 NO 39220784_s_at UTS2 SEQ ID NM_021995 NP_068835 NO 40 228988_at ZNF6 SEQ IDNM_021998 NP_068838 NO 41 229039_at SYN2 SEQ ID NM_003178 NP_003169 NO42 230983_at BCNP1 SEQ ID NM_173544 NP_775815 NO 43 232911_at KIAA1559SEQ ID NM_020917 NP_065968 NO 44 233559_s_at WDFY1 SEQ ID NM_020830NP_065881 NO 45 235885_at P2RY12 SEQ ID NM_022788 NP_073625 NO 46235885_at P2RY12 SEQ ID NM_176876 NP_795345 NO 47 229893_at MGC20553 SEQID NM_174938 NP_777598 (FRMD3) NO 48 221557_s_at LEF1 SEQ ID NM_016269NP_057353 NO 49 1552772_at CLEC4D SEQ ID NM_080387 NP_525126 NO 50220001_at PADI4 SEQ ID NM_012387 NP_036519 NO 51 206026_s_a TNFAIP6 SEQID NM_007115 NP_009046 NO 52 202431_s_at MYC SEQ ID NM_002467 NP_002458NO 53

TABLE 14 Gene RNA Sense Primer SEQ Gene ID Acc′n (5′ Primer) ID NO:OSBPL10 114884 NM_017784 GCTGGTGGTGTACTCTGCTA 352 OSBPL10 114884NM_017784 GTGCCTCAACTTGTTACAGC 353 OSBPL10 114884 NM_017784TCTATGGAGGGAAAGTCCAC 354 LOC283130 283130 NM_182556 TGGTCTTGATCTCCTGACTT355 LOC283130 283130 NM_182556 TTCTTCAAGGGAATGAGCTT 356 LOC283130 283130NM_182556 TCAACTCTGTCTCCTCTGCT 357 BANK1  55024 AK091523ACAGGCATCACTACTTCCAA 358 BANK1  55024 AK091523 TGAAAGCAGGAAGACATACG 359BANK1  55024 AK091523 TATCCAGCTTCACTTTCTGC 360 TAGLN   6876 NM_003186AACCCAGACACAAGTCTTCA 361 TAGLN   6876 NM_003186 AGCAATGGTAACTGCACCT 362TAGLN   6876 NM_003186 TGACATGTTCCAGACTGTTG 363 COBLL1  22837 NM_014900ATGGCTAGATGTCCCAAGTT 364  COBLL1  22837 NM_014900 GCTTGGTGTGTCTGATAAGG365 COBLL1  22837 NM_014900 CAGAACAGATGCGACAGAGT 366 MGC24039 160518AK125323 TTGACAAAGCTTCCTTTCTG 367 MGC24039 160518 AK125323ACCTTTGAGTGCCAGAACTT 368 MGC24039 160518 AK125323 GGACACAAACGGACAATAAA369 C9orf85 138241 NM_182505 GCACCAGAATACGTTTAGCTT 370 C9orf85 138241NM_182505 CTGGGAGTAGTTCGTTGGTT 371 C9orf85 138241 NM_182505GGAGATTGAAGTGAGCTGAG 372 BLNK  29760 NM_013314 CTTGAGACCAGAGGCTTACC 373BLNK  29760 NM_013314 GCTATTGAAGTGGTCATCCA 374 BLNK  29760 NM_013314CCGTGGAAGATAATGATGAA 375 BCNP1 199786 NM_173544 ACTGGATATTGGCAGCTTCT 376BCNP1 199786 NM_173544 GCAGCTCCAAATCTTAACTTG 377 BCNP1 199786 NM_173544TTTCCTCCCATTCTGTCTG 378 PDE3B   5140 NM_000922 AATGGCTATCGAGACATTCC 379PDE3B   5140 NM_000922 CTTCCACCACAAGTCATTTC 380 PDE3B   5140 NM_000922AGGTGGGATCGTAATAATGG 381 ENC1   8507 NM_003633 TGCCGTCGTAGGTATTAGTG 382ENC1   8507 NM_003633 TGTGCCGTCGTAGGTATTAG 383 ENC1   8507 NM_003633CAAACCATCAGGAAGAATGA 384 AKAP13  11214 NM_006738 CTCCTGGTCATGATTGTTGT385 AKAP13  11214 NM_006738 AGGTTCTGGTTGGACAAGTT 386 AKAP13  11214NM_006738 GTTGCACAGACTGAAAGTCC 387 WDFY1  57590 NM_020830GACAAGTGTGTGAGCTGGAT 388 WDFY1  57590 NM_020830 GCTGAAGCTTGAACAGAACA 389WDFY1  57590 NM_020830 CTGTGCTACCTTCAGCTCAC 390 CDA    978 NM_001785CAAAGGGTGCAACATAGAAA 391 CDA    978 NM_001785 TACAAGGATTTCAGGGCAAT 392CDA    978 NM_001785 TGCCTTGGGACTTAGAACAC 393 AGTRAP  57085 NM_020350GTTCTAGGGATGCTCCTGAC 394 AGTRAP  57085 NM_020350 CTCCTGCTGCTTCGTCTAC 395AGTRAP  57085 NM_020350 CACCATCTTCCTGGACATC 396 ACTR2  10097 NM_005722GTGCTTTCTGGAGGGTCTAC 397 ACTR2  10097 NM_005722 AGACTCTGGAGATGGTGTGA 398ACTR2  10097 NM_005722 AGAGCAGAAACTGGCCTTAG 399 GYG1   2992 NM_004130TGTGTACCTTTCACGAGACC 400 GYG1   2992 NM_004130 CCAAAGTTGTGCATTTCCT 401GYG1   2992 NM_004130 TGTGGCTTCTGTAGAAAGGA 402 C6orf150 115004 NM_138441TATAACCCTGGCTTTGGAAT 403 C6orf150 115004 NM_138441 GATATAACCCTGGCTTTGGA404 C6orf150 115004 NM_138441 ACTGCCTTCTTTCACGTATG 405 KLRD1   3824AF498040 ACAATTCAACGCTGTTCTTT 406 KLRD1   3824 NM_007334CGGTGCAACTGTTACTTCAT 407 KLRD1   3824 NM_007334 CACATCGTGCCTTCTCTACT 408UTS2  10911 NM_021995 TTATGCTCTGCGTCACTTCT 409 UTS2  10911 NM_021995CACTTCTGCTCGGACTCATA 410 UTS2  10911 NM_021995 CTTTCAACTCTCAGCACCTC 411MS4A1    931 NM_021950 TCGTTGAGAATGAATGGAAA 412 MS4A1    931 NM_021950ACCCATCTGTGTGACTGTGT 413 MS4A1    931 NM_021950 GTGTTGTCACGCTTCTTCTT 414SPATA1  64173 NM_022354 AAGTCACAGCATCAATGGAG 415 SPATA1  64173 NM_022354CTGATGGAACAATCCACAGA 416 SPATA1  64173 NM_022354 CGATATCATGCCTACAATGG417 LIMS1   3987 NM_004987 AGCTGTACCATGAGCAGTGT 418 LIMS1   3987NM_004987 TATCGGGTTTGTCAAGAATG 419 LIMS1   3987 NM_004987GTGATGTGGTCTCTGCTCTT 420 ZNF6   7552 NM_021998 GATGGGTTTGGTTCTGAAGT 421ZNF6   7552 NM_021998 TCAGCTTGAGGACTCTGATG 422 ZNF6   7552 NM_021998AAGACCCATACTGGAAGGAA 423 ANK3    288 NM_001149 TTGATCATCCAAGCCTAGTG 424ANK3    288 NM_001149 AATTCACCGAGAAGTGTGTG 425 ANK3    288 NM_001149ATGGACCTCCAGTCGTAACT 426 KIAA1559  57677 NM_020917.1AAAATTAGCCAGGCATGGTG 427 KIAA1559  57677 NM_020917.1ACCAAATTTCAGGGTCACCA 428 KIAA1559  57677 NM_020917.1ACCAAATTTCAGGGTCACCA 429 PPGB   5476 NM_000308 GAGAGCTATGCTGGCATCTA 430PPGB   5476 NM_000308 ACAGGCTTTGGTCTTCTCTC 431 PPGB   5476 NM_000308AACAAGCAGCCATACTGATG 432 GBP5 115362 NM_052942 CCTACCTGATGAACAAGCTG 433GBP5 115362 NM_052942 ACCAAGGGAATTTGGATATG 434 GBP5 115362 NM_052942GAAGAAAGAGGCACAAGTGA 435 MT1E   4493 NM_175617 AAGTCTACTGCCACCTCTCAC 436MT1E   4493 NM_175617 CCTATGGTTTCAGAACAGAGC 437 MT1E   4493 NM_175617ACTGCCACCTCTCACTCTC 438 FRMD3 257019 NM_174938 GAACCCACTGGTCAAGAGTT 439FRMD3 257019 NM_174938 TTGTGGTCTTTCAGGGAAAT 440 FRMD3 257019 NM_174938AACATTTCTGCTCCCTTGAT 441 G1P3   2537 NM_022873 GGTAATATTGGTGCCCTGA 442G1P3   2537 NM_022873 TATTGTCCAGGCTAGAGTGC 443 G1P3   2537 NM_022873TCTTCTCTCCTCCAAGGTCT 444 CEACAM1    634 NM_001712 CAACCTACCTGTGGTGGATA445 CEACAM1    634 NM_001712 CACTTCACAGAGTGCGTGTA 446 CEACAM1    634NM_001712 AGGTTCTTCTCCTTGTCCAC 447 ZBTB38 253461 XM_172341.5AGGGACCTCAAGGACGACTT 448 ZBTB38 253461 XM_172341.5 ACCAACGCTGAATTTCCAAG449 ZBTB38 253461 XM_172341.5 ATGGAGGAGGTTCACACCAG 450 P2RY12  64805NM_176876 TGGTAACACCAGTCTGTGC 451 P2RY12  64805 NM_022788GCTGCAGAACAGAACACTTT 452 P2RY12  64805 NM_022788 ACATTCAAACCCTCCAGAAT453 HIST1H4H   8365 NM_003543 GTGTTCTGAAGGTGTTCCTG 454 HIST1H4H   8365NM_003543 TAACATCCAGGGCATCACTA 455 HIST1H4H   8365 NM_003543ATAACATCCAGGGCATCACT 456 ITGB3   3690 NM_000212 GACTCAAGATTGGAGACACG 457ITGB3   3690 NM_000212 TTACCACTGATGCCAAGACT 458 ITGB3   3690 NM_000212AACCCTTCAGATTTGCCTTA 459 PRG1   5552 NM_002727 CTTCCCACTTTCTGAGGACT 460PRG1   5552 NM_002727 GGTTCTGGAATCCTCAGTTC 461 PRG1   5552 NM_002727TCGAACTACTTCCAGGTGAA 462 BRD2   6046 NM_005104 GATTCAGAGGAGGAGGAAGA 463BRD2   6046 NM_005104 ATGCCAGATGAACCACTAGA 464 BRD2   6046 NM_005104ACGGCTTATGTTCTCCAACT 465 LTBP3   4054 AF318354 ACCTTCAGGGCTCCTATGT 466LTBP3   4054 AF318354 GGACAACAACATCGTCAACT 467 LTBP3   4054 AF318354CTCTGGCTACCATCTGTCC 468 MAP4K3  84911 NM_003618 TGAAATTTGATCCACCCTTA 469MAP4K3  84911 NM_003618 TAAAGTGCATATGGGTGCAT 470 MAP4K3  84911 NM_003618CTGGAATATGGACAAGGACA 471 NIPA2  81614 NM_030922 TATCAAGGAGCTGTTTGCAG 472NIPA2  81614 NM_030922 AACACTTCCATTGTGACTCC 473 NIPA2  81614 NM_030922CTCACAAGCTAGGTGATCCA 474 SYN2   6854 NM_003178 GAACTGGAAGACGAACACTG 475SYN2   6854 NM_003178 CATAATGGGAGACCAAATCA 476 SYN2   6854 NM_003178TCTGTGCTGTCAAAGCTGTA 477 LEF1  51176 NM_016269 GATGTCAACTCCAAACAAGGC 478LEF1  51176 NM_016269 TGGATTCAGGCAACCCTAC 479 LEF1  51176 NM_016269AGGCTGGTCTGCAAGAGACA 480 CLEC4D 362432 NM_080387 TTCACGCTGTAAGAGAGGCAC481 CLEC4D 362432 NM_080387 GAACTGAAAAGTGCTGAAGGG 482 CLEC4D 362432NM_080387 GGGCTGAGAGTGAAAGGAAC 483 PADI4  23569 NM_012387CTCCACCAGTCAAAGCTCTA 484 PADI4  23569 NM_012387 ATGCAGGATGAAATGGAGAT 485PADI4  23569 NM_012387 GAAAGATCAGAGGACCTGGA 486 TNFAIP6  23569 NM_007115CAGGTTGCTTGGCTGATTATG 487 TNFAIP6  23569 NM_007115CATTAGACTCAAGTATGGTCAGCG 488 TNFAIP6  23569 NM_007115TACCACAGAGAAGCACGGTC 489 MYC   4609 NM_002467 GAGGCTATTCTGCCCATTTG 490MYC   4609 NM_002467 TTTCGGGTAGTGGAAAACCAG 491 MYC   4609 NM_002467AGTGGAAAACCAGCAGCCTC 492 CD163   9332 NM_203416 TGAGTCTTCCTTGTGGGATTGT493 CD163   9332 NM_203416 TCTTCCTTGTGGGATTGTCC 494 CD163   9332NM_203416 CCCACAAAAAGCCACAACA 495 SEQ SEQ  Antisense Primer ID ID (3′Primer) NO: TagMan Probe NO: OSBPL10 CATTTCCATGTGGTATTTGG 496CAAGCTCGAAGCTGAGTCACCCA 640 OSBPL10 CTGCTCTGTGGAATGTGACT 497CAGCCCAGCCAGAAGCCAGG 641 OSBPL10 TGACTTTGGTTTCTCCATTG 498CCGCAGAAGTGAAGCACAACCC 642 LOC283130 AGGGACTCTCAGAAGTGGAG 499 642LOC283130 GGAAGATGTGCATGTAGCTG 500 CTGCTGGTGCTCACGGCCAC 643 LOC283130TTACAGTGTCAAACGGGTGT 501 CCAGTTCCCGCAAGCCCAGA 644 BANK1ACCATCTTCACCATATGCAA 502 TGAGGATCCCACATCCTGAGATCAA 645 BANK1ACCTCTAGTGGGCTGTGTTT 503 CATTTGCCTCAGCTCCATCTGCA 646 BANK1TGTCAAGTACAGAGCCCATT 504 CCCATGGTCAACTGCCATCTGAA 647 TAGLNTCCAGCTCCTCGTCATACTT 505 CTGCACTTCGCGGCTCATGC 648 TAGLNCCAGGGAGGAGACAGTAGAG 506 TAGLN GCGCTTTCTTCATAAACCA 507CTGCCAAGCTGCCCAAAGCC 649 COBLL1 AATGACACAGCCAGTAGTGC 508 COBLL1ACTCTGTCGCATCTGTTCTG 509 ACCAACTCCAACTGATGGCCCA 650 COBLL1AATGGCTGAGTCTTGACCTT 510 MGC24039 TTTCTCTTCCCACTGAGACA 511CCAGCCTGAGCCTTACCTGCCA 651 MGC24039 TCATTTCTGACCATGACACA 512 MGC24039GCTGTGCAGATTTGCTTACT 513 CCAGGGAAGATGGTTCTCGCACC 652 C9orf85TTTACACGCCACTCAAGAAC 514 CAGCGCTGACATACTCCATCATGAAG 653 C9orf85CGTATTCTGGTGCTTCTGAG 515 TCTGGAACGAGCCACGTTGCC 654 C9orf85AAGGTGGTCAGAATTTCACA 516 CGTGCCATTACACTCCAGCCTGG 655 BLNKAAGCTTGTCCATTCTGTTTG 517 CACGTCAGCAGTTCCTGGCCC 656 BLNKCATGATAACTCAACCTCACCA 518 CAGGTCAGGCAATAGAACAAGTCCACA 657 BLNKGAATTTGGCTTGGTTGATCT 519 TCCCACAGAAAGCAGTTCACCTCCA 658 BCNP1TCTCCAGGGAGTGTGAAAG 520 CCCACTCCGGAAGCAGCTGC 659 BCNP1AGAAGCTGCCAATATCCAGT 521 CCGACTTCCTCTGCTTGCCAGC 660 BCNP1AGCAAATGTATTTCGGAAGG 522 TGCCCTTTCCTCCTATTTCCCTCCA 661 PDE3BTTCATTTCCTGTTCCACAAC 523 CAACACGGCCAGTTCCTGGC 662 PDE3BCTCAGCTGGGTCCTCTATTT 524 TGCTTTCTCAGGTTCCTGTAGGCCA 663 PDE3BTGCTTCGGGATAGTCAGTAG 525 AAAGGCCTCACCAAGAATTTGGCA 664 ENC1GTGAGAAACATGGACGAAAG 526 TCGAGTTGCAAACTTTGGTCTTCCC 665 ENC1GTGAGAAACATGGACGAAAG 527 TCGAGTTGCAAACTTTGGTCTTCCC 666 ENC1TCCAGTTAATTGCAGACTCG 528 CAAGCCTTTCATCCTCTGTCTCCAGC 667 AKAP13ATCCTGGACGAAATGAAAGT 529 CCACAAACACGGGAAGGCCC 668 AKAP13GGGAAGAGAGATGACAAAGG 530 AKAP13 TCCAGAGTCCACAATAGGTG 531CAGGAACGGTCTGTATTCTCCTCCTCA 669 WDFY1 TGATCTGCCCAGAATAATCA 532TGCCTCCCGAGCATGTTCCC 670 WDFY1 TCAGATGCTCCTGAGAAGAG 533TCGCCTGCCTCTGGTGGGAC 671 WDFY1 ACTTTCCAACCACTGAGGAG 534TTCCGCCGTCCGAGGAACAG 672 CDA TTGCCCTGAAATCCTTGTA 535ACCCGCTGGGCATCTGTGCT 673 CDA AGTTGGTGCCAAACTCTCTC 536TGCTATCGCCAGTGACATGCAAGA 674 CDA CAGGATAGAACCTTGGGAAG 537TCCTTTCCTTCCTGTGGGCCC 675 AGTRAP GGAGTGTGCATGGTACTGG 538CCCAGCTCAGGGATTGCCTGA 676 AGTRAP CCTCTGCTGAGTCAATCGT 539CCGCGCTCCCGGTACATGTG 677 AGTRAP CTCCCGGTACATGTGGTAG 540CTACCCGCGGGTCAGCCTCA 678 ACTR2 TCTTCAATGCGGATCTTAAA 541ATCCTGGCCTGCCATCACGG 679 ACTR2 CTCGCAACAGAAGTAGCTTG 542TCCCTCCCAGCAATATCCAGTCTCC 680 ACTR2 GGCTGAAATAAAGCTTCTGG 543CCCAACTTTGATGATACGTCCATCTGG 681 GYG1 TGGGTTAATGCTATTGGTTG 544TCACTCTGGCAGCACTGGGCA 682 GYG1 GGTAAAGATGTTCCACCACA 545TGAGGCCCATGATCCCAACATG 683 GYG1 TCTGCTCCCATATAATCAGC 546TCCCAGCTATGGCACAGCCG 684 C6orf150 ATGGCTTTAGTCGTAGTTGCT 547AGCACCCAAGAAGGCCTGCG 685 C6orf150 ATGCTTGGGTACAAGGTAAA 548AGCACCCAAGAAGGCCTGCG 686 C6orf150 CCTGAGGCACTGAAGAAAG 549CCCAGGTCTTTGCGGTCCCA 687 KLRD1 TTATCCCTAAGGTCCCAGAA 550CGTGCCTTCTCTACTTCGCTCTTGGA 688 KLRD1 AATTGTTGACTGGAGCTCAT 551TGTGCTTCTCAGAAATCCAGCCTGC 689 KLRD1 TCTATGTTGGGTCCTGGAGT 552CGCTCTTGGAACATAATTTCTCATGGC 690 UTS2 GAGGTGCTGAGAGTTGAAAG 553TCCACGTCTCTTTGCTTTGGCCA 691 UTS2 GTCTTCATGAGGTGCTGAGA 554TCCACGTCTCTTTGCTTTGGCCA 692 UTS2 TGAGTCTGCTTTCCTGAGAA 555 MS4A1ATGTTTCAGTTAGCCCAACC 556 MS4A1 CTTTGACCAAACACTTCCTG 557CCGGATCACTCCTGGCAGCA 693 MS4A1 AAAGCCTATCCAAGGAACAG 558TCACATTCTGAAGCACTCATTCTGCCT 694 SPATA1 AGTCTGTGATGTTTGCCAGA 559CCCTGGCTTCAAGTTGCATGAGC 695 SPATA1 CCAGCTTCTTCCTGATTCTT 560TGGGATCTCTTCCAAGTTCCTCCTTGA 696 SPATA1 CTTCAAGTTGCATGAGCAGT 561CCTCCTCCATTGATGCTGTGACTTTC 697 LIMS1 GGCAAAGAGCATCTGAAAG 562TGCGCTCAGTGCTTCCAGCA 698 LIMS1 TGAATATCAGAGGCTGCTCA 563TGGGAGACACCTGTGTCGCCC 699 LIMS1 TACAGACTGGCTTCATGTCA 564AGCAGTTCACGCACCAGGCC 700 ZNF6 CTCCAGCTACTGAATGTCCA 565TTCAACATCATCTTCAGCCTCCGC 701 ZNF6 TTTGAAGGTACTGTGCTGCT 566TGCCGCAGCCCAGACAACAG 702 ZNF6 CCTTCTTGCAGAATTCACAC 567TCACATGTCGTTTAAAGCCAGATGCA 703 ANK3 GAATCATCACCCAATTCCTT 568CGGTCATTGCATCTTCACTCTGCG 704 ANK3 TGAAAGTGATTTCCATGCTC 569CGCTGCTTCCACACATTAATGGCA 705 ANK3 CTGGCTCTCATCTACATCCA 570TGCCTTTAACAGAAATGCCTGAAGCA 706 KIAA1559 GCGATCTCAACTCACCACAA 571TCCGCCTCCTGGATTCAAGTGA 707 KIAA1559 CTCAAACTCCTGGCCTCAAG 572CCCTGTGCCAGGCCAAATTCA 708 KIAA1559 CTCAAGTGATCCACCCACCT 573CCCTGTGCCAGGCCAAATTCA 709 PPGB GACCAGGGAGTTGTCATTCT 574TCCTGCATGACCAGCACGGC 710 PPGB TTGTAGATGTTGAGGCCAGA 575CACGATGCGGGCCACTTCCT 711 PPGB GGCACTTTGCTTAGAAGAGG 576CAGCTCCACGGCCTGATGCA 712 GBP5 GATGAGGCACACACCATATC 577CGGTGCAGTCTCACACCAAGGG 713 GBP5 AGTAAGAGTGCCAGTGCAAA 578CTCCCAGGCCCTCGGTGTCA 714 GBP5 TGTTTGGCTATCTCCATTTG 579TCTGCCTTTGAATCGCCGCC 715 MT1E AAGAGCTGTTCCCACATCA 580CCCTGGGCACACTTGGCACA 716 MT1E AGTGAGAGGTGGCAGTAGAC 581CGGGCACCTCCCTGCCCTAA 717 MT1E AGAGCTGTTCCCACATCAG 582CCCTGGGCACACTTGGCACA 718 FRMD3 ACTGCTCAAACTCTGGTGTC 583CCAGGCCCACCACAAGGAGC 719 FRMD3 AGTATTGAATGCCAACATGG 584TCTCCTTCTGGGTGCCAATCACA 720 FRMD3 GGCACTTGGGTTGTACACTA 585CTCCCGGGCTGCCTTCACTG 721 G1P3 TTAGGCCAAGAAGGAAGAAG 586TGGGCTACGCCACCCACAAG 722 G1P3 CTACTTGGGAGGTTGAGACA 587TGCAGCCTCCAACTCCTAGCCTCA 723 G1P3 TCTTACCTGCATCCTTACCC 588TACCGCCTTCTGCCGCATGG 724 CEACAM1 TTCACACTCATAGGGTCCTG 589TCCCAGGCTGCAGCTGTCCA 725 CEACAM1 GAATGGCATGGATTCAGTAG 590CTTCTGGAACCCGCCCACCA 726 CEACAM1 GGGTAGCTTGTTGAGTTCCT 591CGGTTGCCATCCACTCTTTCCC 727 ZBTB38 TTGAAGCTGCCAGAACATTG 592CCCGAATGCGCTGCTCATTTAA 728 ZBTB38 TCCATTTTCAATGCCTCCTC 593CAACAGTCCAGCCATCCCATTGG 729 ZBTB38 GGATTATGGGGCTCATGCTA 594TCATGCTGCACCCTGACCGG 730 P2RY12 GCCATTTGTGATAAGTCCAA 595CACCCAGGTCCTCTTCCCACTGC 731 P2RY12 TGCCAAACCTCTTTGTGATA 596TGCTCTTGTAATCTGACCCTGGACATG 732 P2RY12 GGTGCACAGACTGGTGTTAC 597CGACAACCTCACCTCTGCGCC 733 HIST1H4H GTAAAGAGTGCGTCCCTGTC 598CGCCAAACGCAAGACCGTGA 734 HIST1H4H GAACACCTTCAGAACACCAC 599TATCCGGCGCCTTGCTCGTC 735 HIST1H4H AGAACACCACGAGTCTCCTC 600TATCCGGCGCCTTGCTCGTC 736 ITGB3 AGGCACAGTCACAATCAAAG 601TGTCCTTGAAGCCCACGGGC 737 ITGB3 TGGAGGCAGAGTAATGATTG 602CCTGCCAGCCTTCCGTCCAA 738 ITGB3 CAGTGAGGGTGTGGAATTAG 603TGTCACCCTTAGGCCAGCACCA 739 PRG1 ATCCTGTTCCATTTCCGTTA 604CGGCTCCGGCTCTGGATCAG 740 PRG1 TTGGATTCACCTGGAAGTAG 605TCTGGATTGCAGCGCACCCA 741 PRG1 GTCCTCAGAAAGTGGGAAGA 606TCCAAAGACGAGAATCCAGGACTTGA 742 BRD2 TGGGTTTGAATCACGTAAAG 607TCCAGGCTCAGCTGCCGCTT 743 BRD2 ATCTTCCTCCTCCTCTTCCT 608CCCTGGCTTGGCCAAATCGTC 744 BRD2 CCCTGGTTCTAGTGGTTCAT 609CGTGCCATTGCCACAACATCG 745 LTBP3 CAC TGTGTCATCGAAGTTCA 610CCACCCAGGACCAGCACGGT 746 LTBP3 GCTTGCAGTAGCACTCGTAG 611CCAGCCCACCGTGACATCGA 747 LTBP3 TGTAGGAGCCATTGGTATTG 612TCCTCGCAGTGGCTCCGGTC 748 MAP4K3 TGGTGTCCTTGTCCATATTC 613CCACATCATGAACTTCCCGACAGTG 749 MAP4K3 TCGGCACCAAATATCAAGTA 614CACTGTGCATCATCATGGATAAACCCA 750 MAP4K3 CTTCTAAATGTGCGACGTGT 615CCTCGCTGATGCAATTCTTCTTCAACA 751 NIPA2 TGGAGTCACAATGGAAGTGT 616AAGCCTGTGCTGCGGCATCC 752 NIPA2 GCCACTCAAAGTACCAATGA 617CATCGTCAACAGGCATATCTTGCCA 753 NIPA2 CCGATTACAGAGCAGATTGT 618ATGGCGAGGACCCACCAC 754 SYN2 TACAGCTTTGACAGCACAGA 619CCAGGCCGCCAAACATCTCA 755 SYN2 GGAAAGAGAAAGCTCCTCAA 620CCACAGCCCTCCAGGGTCCA 756 SYN2 CTAGTTCGGTGATGAGTTGC 621TGGCATGCTACAGTCCATGACCTCA 757 LEF1 AGTGAGGATGGGTAGGGTTG 622TTATCCCTTGTCTCCGGGTGGT 758 LEF1 TGTGGGGATGTTCCTGTTTG 623ATGTCCAGGTTTTCCCATCA 759 LEF1 GAATGAGCTTCGTTTTCCACC 624TGCATCAGGTACAGGTCCAAGA 760 CLEC4D AGTTGGACTGGAAGGCTCTC 625GAACTGAAAAGTGCTGAAGGGA 761 CLEC4D TCCTTTCACTCTCAGCCCAC 626AGCACCTGGAACTGTTGTCCT 762 CLEC4D CCACTGACCTTTGGCATTC 627ATGACCATCAGCACGGAAGC 763 PADI4 TCCAGGTCCTCTGATCTTTC 628CTACATCCAAGCCCCACACAAAAC 764 PADI4 GTTTGATGGGAAACTCCTTC 629CTACATCCAAGCCCCACACAAAA 765 PADI4 TCAAGCACTTCATCATCCTC 630TACATCCAAGCCCCACACAAAA 766 TNFAIP6 TTGATTTGGAAACCTCCAGC 631TGGCTTTGTGGGAAGATACTGTGG 767 TNFAIP6 TCCACAGTATCTTCCCACAAAG 632CAGGTTGCTTGGCTGATTATGT 768 TNFAIP6 ATCCATCCAGCAGCACAGA 633GCAGAAGCTAAGGCGGTGTGTGAA 769 MYC TCCTGTTGGTGAAGCTAACG 634TTTCGGGTAGTGGAAAACCA 770 MYC CGTCGCAGTAGAAATACGGCT 635CTATGACCTCGACTACGACTCGGT 771 MYC CGTCGCAGTAGAAATACGGCT 636ATGACCTCGACTACGACTCGGT 772 CD163 CGACCTGTTGTGGCTTTTT 637TTCAGTGCAGAAAACCCCACA 773 CD163 AGGATGACTGACGGGATGA 638CAGTGCAGAAAACCCCACAAA 774 CD163 CTTGAGGAAACTGCAAGCC 639TCATCCCGTCAGTCATCCTTTATTGC 775

TABLE 15 Commercially Available Gene Protein Antibody Symbol DescriptionReference Scientific Reference OSBPL10 oxysterol binding protein-like 10LOC283130 hypothetical protein LOC283130 BANK1 B-cell scaffold proteinwith ankyrin repeats 1 TAGLN Transgelin Abcam Nishida W et al. Genecloning and nucleotide (alternative Ab14106 sequence of SM22 alpha fromthe chicken name smooth AntiHuman Rabbit gizzard smooth muscle. BiochemInt 23: 663-8 muscle 22 Polyclonal (1991). Protein SM22) Antibody COBLL1COBL-like 1 MGC24039 hypothetical protein MGC24039 C9orf85 chromosome 9open reading frame 85 BLNK B-cell linker Abcam Kabak S et al. The directrecruitment of BLNK Ab4474 AntiHuman to immunoglobulin alpha couples theB-cell Rabbit Polyclonal antigen receptor to distal signaling pathways.Antibody Mol Cell Biol 22: 2524-35 (2002). BCNP1 B-cell novel protein 1PDE3B phosphodiesterase SantaCruz Liu H, and Maurel D. H. 1998Expression of 3B, Biotechnology cyclic GMP-Inhibited phosphodiesterase3A cGMP- AntiHuman Rabbit and 3B (PDE3A and PDE 3B) in rat tissues:inhibited Polyclonal differential subcellular localization and Antibodyregulated expression Br. J. Pharmacol 125 1501: 1510. ENC1 ectodermal-neural cortex (with BTB- like domain) AKAP13 A kinase Other FamilyAntibodies Available including: (PRKA) Abcam Ab10346 AKAP12 AntiRatRabbit anchor protein Polyclonal Antibody 13 Abcam Ab 25805 AKAP 9AntiHuman Rabbit Polyclonal Antibody Abcam AB14096 AKAP 3 GoatPolyclonal Antibody WDFY1 WD repeat and Abcam Ab21695 FYVE domainAntiHuman Goat containing 1 Polyclonal (aka FENS-1) Antibody CDAcytidine Abcam Ab5197 Duquette ML et al. AID binds to transcription-deaminase Antihuman Rabbit induced structures in c-MYC that map toPolyclonal regions associated with translocation and Antibodyhypermutation. Oncogene 24: 5791-8 (2005). AGTRAP angiotensin IIreceptor- associated protein ACTR2 ARP2 actin- related protein 2 homolog(yeast) GYG glycogenin 1 C6orf150 chromosome 6 open reading frame 150KLRD1 killer cell Abcam Ab19740 Moretta A et al. Human natural killercell lectin-like CD94 Ab (aka receptors for HLA-class I molecules.Evidence receptor KLRD1 Ab) that the Kp43 (CD94) molecule functions assubfamily D, AntiHuman Mouse receptor for HLA-B alleles. J Exp Medmember 1 Monoclonal 180: 545-55 (1994). Antibody UTS2 urotensin 2 AbcamAb14200 Antihuman Rabbit Polyclonal Antibody MS4A1 membrane- AbcamAb9475 Mason DY et al. Antibody L26 recognizes an spanning 4- AntiHumanMouse intracellular epitope on the B-cell-associated domains, MonoclonalCD20 antigen. Am J Pathol 136: 1215-22 subfamily A, Antibody (1990).member 1 SPAP1 Fc receptor- like 2 LIMS1 LIM and senescent cellantigen-like domains 1 ZNF6 zinc finger protein 6 (CMPX1) ANK3 ankyrin3, node of Ranvier (ankyrin G) KIAA1559 mouse zinc finger protein14-like PPGB protective protein for beta- galactosidase(galactosialidosis) GBP5 guanylate binding protein 5 MT1Emetallothionein 1E (functional) MGC20553 FERM domain (FRMD3) containing3 G1P3 interferon, alpha- inducible protein (clone IFI-6-16) CEACAM1carcinoembryonic Abcam Ab26279 antigen- Antihuman mouse related cellmonoclonal adhesion Antibody molecule 1 (biliary glycoprotein) FLJ35036zinc finger and (ZBTB38) BTB domain containing 38 P2RY12 purinergicAlamone Labs Queiroz, G. et al. (2003) J. Pharmacol. Exp. receptor P2Y,#APR-012 Ther. 307, 809. G-protein AntiHuman Mouse coupled, 12Polyclonal Antibody HIST1H4H histone 1, H4h ITGB3 integrin, beta 3 AbcamAb7167 Pittier R et al. Neurite extension and in vitro (plateletAntiHumanMouse myelination within three-dimensional modifiedglycoprotein Monoclonal fibrin matrices. J Neurobiol 63: 1-14 (2005).;IIIa, antigen Antibody Soldi R et al. Role of alphavbeta3 integrin inCD61) the activation of vascular endothelial growth factor receptor-2.EMBO J 18: 882-92 (1999). PRG1 proteoglycan 1, secretory granule BRD2bromodomain Abcam Ab19276 containing 2 Antihuman Rabbit PolyClonalAntibody LTBP3 latent Abcam Ab21621 transforming Antibovine Rabbitgrowth factor Polyclonal beta binding protein 3 MAP4K3 mitogen-activated protein kinase kinase kinase kinase 3 NIPA2 non imprinted inPrader- Willi/Angelman syndrome 2 SYN2 synapsin II Abcam Ab12240Antihuman Rabbit Monoclonal LEF1 Lymphoid Abcam Ab22884 enhancer-Antihuman Rabbit binding factor 1 Polyclonal; Ab12037 Antihuman MouseMonoclonal; CLEC4D C-type lectin domain family 4, member d PADI4peptidyl Abcam AB26071 arginine Anithuman goat deiminase, polyclonaltype IV TNFAIP6 tumor necrosis Abcam Ab36380 factor, alpha- Antihumanchicken induced polyclonal protein 6 MYC v-myc Abcam Ab1383myelocytomatosis Antihuman rabbit viral polyclonal oncogene homolog(avian)

TABLE 16 5′ Primer (Sense Primer) 3′ Primer (Antisense Primer) SEQ SEQID Posi- ID Prod- Symbol Primer Sequence NO: Primer Sequence tion NO:uct LOC283130 AAGGCCAGAATCCCAGCTCAG  934 776 ATCCATCTGCATCCGGGACTTGAT1044 818 111 MGC24039 AACAAGGGATCGCCTGCTCC 3717 777ATAAGGGAGTTGACAGTCATGCGG 3833 819 117 ACTR2 ACCGGGTTTGTGAAGTGTGGAT  109778 CACTTGCCTCATCACCAACCAT  253 820 145 AGTRAP CATGGCCATCCTCAGCTTGC  325779 AAGACCCAAGGAAACCAGTGTGGA  434 821 110 AKAP13 TCTCAGCCCGGTGATGGTC8562 780 TGTAAGAGACTTGTGCACGCGG 8710 822 149 ANK3ATACGCCATTACATCAAGCAGCAC 781 CCAAGGGCAGTATTCCCATTCACA 823 BANK1TGCTGAAAGGCATGGTCACAAAG 1303 782 GCTGGGTTCTGTGTGGAAGGAATA 1449 824 147BCNP1 GGCGCGTGCTGAAGAAATTCAA 1664 783 TTTGCAGCCTGGCTCGAGTTG 1782 825 119BLNK TTTCAGAACAGGAAGCTGGCGT 1172 784 GTTGTTTGGAATCATGGCCAGAGC 1315 826144 BRD2 TGCCTATGCTTGGCCTTTCT 2799 785 ATCTTCCGCTTGACAGTGCTGA 2909 827111 C6orf150 CAAGAAGGCCTGCGCATTCAAA 1098 786 AGCCGCCATGTTTCTTCTTGGA 1225828 128 C9orf85 TTATTCCGTTGAATAAAGAAACAGA  545 787 TTTGATGGTCTCCTCCTGTG 694 829 150 CDA ATCGCCAGTGACATGCAAGA  376 788 TACCATCCGGCTTGGTCATGTA 484 830 109 CEACAM1 ATTGGAGTAGTGGCCCTGGTTG 789 ATTGGAGTGGTCCTGAGTGTGGT831 COBLL1 AGGAAGAGTGAGGGCAGGTTCA  375 790 GCTGTAAGGCAGTCACACGACTAT  523832 149 ENC1 CATGAGCTCACTCCATCACTCGAT 2196 791 AGCATTTACAAGGTGCAGCAGAT2325 833 130 FLJ35036 CTCCGAGTTGTCTTGAAGTGAGG 792 TTGGCAAAGATTGGGCAGCAAG834 G1P3 CCTCCAAGGTCTAGTGACGGA   68 793 CCCACTGCAAGTGAAGAGCA  167 835100 GBP5 CTGACTCTGCGAGCTTCTTCC 794 GATCACTACCTTGCTTTGGCCTT 836 GYGGGGACCAAGGCATACTGAACACA  538 795 TGGCACTTGCACCAAACACTTT  678 837 141HIST1H4H CACTTACACAGAGCACGCCAAA 796 TTAGCCACCGAAGCCGTAAAGA 838 ITGB3AGCTCATTGTTGATGCTTATGGGA 1118 797 ATACAAGACTTGAGGCCAGGGA 1232 839KIAA1559 TGTGGGAGAACTACAGCAAC   95 798 TGGACTCCAAATCAGGGCAGTA  238 840144 KLRD1 cccagtatctatttccatcatttg  670 799 tctctgccccaagaaacatt  819841 150 LIMS1 AGGTGATGTGGTCTCTGCTCTT 800 CAGACTGGCTTCATGTCAAACTCC 842LTBP3 TCTGCATCAACTTTCCCGGTCA 2024 801 TTGTTCTCGCATTTGCCATCCG 2162 843139 MAP4K3 TGAACTTCCCGACAGTGATGGT 1141 802 AACCACCTTGGTGTCCTTGT 1250 844110 MGC20553 AGGTGCACAGAGCCAACATTAC 803 AATGGAACACCCTCACCCAGA 845 MS4A1AAAGAACGTGCTCCAGACCC 804 TTCAGTTAGCCCAACCACTTCTTC 846 MT1ECTTGTTCGTCTCACTGGTGT    1 805 ACTCTTCTTGCAGGAGGTGCAT  142 847 142 NIPA2CTGGACTGCTGTCAATGGGA  471 806 ACTTACTAGCACGCTGAGAGC  577 848 107 OSBPL10ATGGAGTCCAGGAACCTCTGG 2467 807 TTTGGGCTTCCATGGTGTGC 2616 849 150 P2RY12GAACACTTTCTCATGTCCAGGGT 808 CCTGCAGAGTGGCATCTGGTATTT 850 PDE3BTGAGCAGGGAGATGAAGAAGCAA 3171 809 GCAAACCAGCAGCATCATAGGAG 3316 851 146PPGB GACACTGTTGTGGTCCAGGATTTG 810 TGGAAGCAGCTGTTGTGTTGG 852 PRG1TACTCAAATGCAGTCGGCTTGTCC 811 ACCCATTGGTACCTGGCTCTCT 853 SPAP1AGCCAGTGTATGTCAATGTGGG 812 GGAGTCCTTGTTCTCCAGAAGTGT 854 SYN2ACAGCTCAACAAGTCGCAGT 1470 813 AAAGAGGCTGGCAAAGGACT 1605 855 136 TAGLNTGAAGGCAAAGACATGGCAG  429 814 TTCCCTCTTATGCTCCTGCG  561 856 133 UTS2AAGCCGTCTATCTTGTGGCGAT   10 815 CGTCTTCATGAGGTGCTGAGAGTT  150 857 141WDFY1 AAGGACATGAAGGTAGTGTCGCCT  632 816 ATGGCCCTGAAGTAACAGCGT  768 858137 ZNF6 TGCAGGGCTGATCTGGGTCT 817 CTTCCACCGCCTGAATCCATACTT 859

TABLE 17 5′ Primer 3′ Primer SEQ SEQ SEQ Primer ID ID Primer ID SymbolRefSeq Affy ID Sequence NO: Taqman Probe NO: Sequence NO: Bank1NM_017935 219667_s_at CTGAAAGGCATG 860 TCCTTCCACACA 866 TCAGCTCCATCT 872GTCACAAA GAACCCAGCA GCACTCTG BCNP1 NM_173544 230983_at CGCGTGCTGAAG 861TTGGCGCAGAG 867 ACTCAGGCAGCT 873 AAATTCAA GAGGTTCAT CCTTTTTG CD163NM_004244; 203645_s_at GCAGCACATGGGAG 862 AGCAAGTGGCCTCT 868ATTGCACGAGGAC 874 NM_203416 215049_x_at ATTGTCCTGTAA GTAATCTGCTCAAGTGTTTGGGA CDA NM_001785 205627_at GCCGTCTCAGAA 863 CAGGGCAATTGCT 869CCAGTTGGTGCC 875 GGGTACAA ATCGCCA AAACTCTC MGC2 NM_174938 229893_atAGGTGCACAGAGCC 864 CTCATCATTAACAT 870 AATGGAACACCCT 876 0553 AACATTACGGAACCCCTGC CACCCAGA MS4A NM_152866 217418_x_at AAAGAACGTGCT 865CATAGTTCTCCTGT 871 TTCAGTTAGCCCAA 877 CCAGACCC CAGCAGAAGA CCACTTCTTC

1-34. (canceled)
 35. A method of determining a probability of whether ahuman test subject is more likely to have colorectal cancer than to nothave colorectal cancer comprising, (a) amplifying a test cDNA from bloodof the human test subject complementary to a test mRNA to obtain valuesfor levels of test mRNA expressed for each gene of a set of genesconsisting of MS4A1 and one or more genes selected from the groupconsisting of MGC20553, CD163, CDA, BANK1, and BCNP1, comprising thesteps of: (i) generating test cDNA from test mRNA expressed specificallyby each of the genes of the set of genes from blood of the human testsubject; and (ii) reacting the test cDNA under conditions to amplify DNAwith an appropriate primer pair comprising a first and a second primer,wherein the first and the second primer of each primer pair represents apair of forward and reverse primers corresponding to MS4A1, MGC20553,CD163, CDA, BANK1 and BCNP1 respectively, to obtain the values of thelevels of the test mRNAs expressed specifically by the genes of the setof genes: and (b) applying to the values of the levels of the test mRNAsa mathematical model formulated using logistic regression analysis oflevels of control cDNA generated from RNA encoded by the set of genes inblood of human control subjects having colorectal cancer and levels ofcontrol cDNA generated from control RNA encoded by the set of genes inblood of human control subjects not having colorectal cancer, whereinthe mathematical model is formulated for determining the probabilitythat a test subject has colorectal cancer as opposed to not havingcolorectal cancer.
 36. The method of claim 35, wherein the set of genesconsists of MS4A1 and MGC20553.
 37. The method of claim 35, wherein theappropriate primer pairs are selected from the primers listed in Table14.
 38. The method of claim 35, wherein the value of the levels ofcontrol cDNA generated from RNA encoded by the set of genes in blood ofhuman subject having colorectal cancer, and not having colorectalcancer, was obtained by the method comprising: (i) generating controlcDNA from the RNA expressed by the genes of the set of genes from theblood of each of the human control subjects; and (ii) reacting thecontrol cDNA under conditions to amplify DNA with appropriate primers,wherein the first and the second primer of each primer pair represents apair of forward and reverse primers corresponding to MS4A1, MGC20553,CD163, CDA, BANK1, and BCNP1 respectively, to obtain the values of thelevels of the control mRNAs expressed by the set of genes.
 39. Themethod of claim 36, wherein the value of the levels of control cDNAgenerated from RNA encoded by the set of genes in blood of human subjecthaving colorectal cancer, and not having colorectal cancer, was obtainedby the method comprising: (i) generating control cDNA from the RNAexpressed by the genes of the set of genes from the blood of each of thehuman control subjects; and (ii) reacting the control cDNA underconditions to amplify DNA with appropriate primers, wherein the firstand the second primer of each primer pair represents a pair of forwardand reverse primers corresponding to MS4A1 and MGC20553 respectively, toobtain the values of the levels of the control mRNAs expressed by theset of genes.
 40. The method of claim 37, wherein the value of thelevels of control cDNA generated from RNA encoded by the set of genes inblood of human subject having colorectal cancer, and not havingcolorectal cancer, was obtained by the method comprising: (i) generatingcontrol cDNA from the RNA expressed by the genes of the set of genesfrom the blood of each of the human control subjects; and (ii) reactingthe control cDNA under conditions to amplify DNA with appropriate primerpairs, wherein the first and the second primer of each primer pairrepresents a pair of forward and reverse primers corresponding to MS4A1,MGC20553, CD163, CDA, BANK1, and BCNP1 respectively, to obtain thevalues of the levels of the control mRNAs expressed by the set of genes.41. The method of claim 35, wherein the conditions to amplify DNAcomprise: (a) combining the test cDNA, the set of primer pairs, athermostable DNA polymerase, and a plurality of free nucleotidescomprising adenine, thymine, cytosine, and guanine in a reactionmixture; (b) heating the reaction mixture to a first predeterminedtemperature for a first predetermined time to separate the strands ofthe test cDNA from each other; (c) cooling the reaction mixture to asecond predetermined temperature for a second predetermined time underconditions to allow the first and second primers to hybridize with theircomplementary sequences on the first and second strands of the testcDNA, and to allow the polymerase to extend the primers; and (d)repeating steps (b) and (c).